A Hybrid Machine-Learning-Based Method for Analytic Representation of the Vocal Fold Edges during Connected Speech

Ahmed M. Yousef; Dimitar D. Deliyski; Stephanie R. C. Zacharias; Alessandro de Alarcon; Robert F. Orlikoff; Maryam Naghibolhosseini

doi:10.3390/app11031179

Applied Sciences (Jan 2021)

A Hybrid Machine-Learning-Based Method for Analytic Representation of the Vocal Fold Edges during Connected Speech

Ahmed M. Yousef,
Dimitar D. Deliyski,
Stephanie R. C. Zacharias,
Alessandro de Alarcon,
Robert F. Orlikoff,
Maryam Naghibolhosseini

Affiliations

Ahmed M. Yousef: Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, MI 48824, USA
Dimitar D. Deliyski: Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, MI 48824, USA
Stephanie R. C. Zacharias: Head and Neck Regenerative Medicine Program, Center for Regenerative Medicine, Mayo Clinic, Scottsdale, AZ 85259, USA
Alessandro de Alarcon: Division of Pediatric Otolaryngology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
Robert F. Orlikoff: College of Allied Health Sciences, East Carolina University, Greenville, NC 27834, USA
Maryam Naghibolhosseini: Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, MI 48824, USA

DOI: https://doi.org/10.3390/app11031179
Journal volume & issue: Vol. 11, no. 3
p. 1179

Abstract

Read online

Investigating the phonatory processes in connected speech from high-speed videoendoscopy (HSV) demands the accurate detection of the vocal fold edges during vibration. The present paper proposes a new spatio-temporal technique to automatically segment vocal fold edges in HSV data during running speech. The HSV data were recorded from a vocally normal adult during a reading of the “Rainbow Passage.” The introduced technique was based on an unsupervised machine-learning (ML) approach combined with an active contour modeling (ACM) technique (also known as a hybrid approach). The hybrid method was implemented to capture the edges of vocal folds on different HSV kymograms, extracted at various cross-sections of vocal folds during vibration. The k-means clustering method, an ML approach, was first applied to cluster the kymograms to identify the clustered glottal area and consequently provided an initialized contour for the ACM. The ACM algorithm was then used to precisely detect the glottal edges of the vibrating vocal folds. The developed algorithm was able to accurately track the vocal fold edges across frames with low computational cost and high robustness against image noise. This algorithm offers a fully automated tool for analyzing the vibratory features of vocal folds in connected speech.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords