Alzheimer’s Dementia Speech (Audio vs. Text): Multi-Modal Machine Learning at High vs. Low Resolution

Prachee Priyadarshinee; Christopher Johann Clarke; Jan Melechovsky; Cindy Ming Ying Lin; Balamurali B. T.; Jer-Ming Chen

doi:10.3390/app13074244

Applied Sciences (Mar 2023)

Alzheimer’s Dementia Speech (Audio vs. Text): Multi-Modal Machine Learning at High vs. Low Resolution

Prachee Priyadarshinee,
Christopher Johann Clarke,
Jan Melechovsky,
Cindy Ming Ying Lin,
Balamurali B. T.,
Jer-Ming Chen

Affiliations

Prachee Priyadarshinee: Science, Mathematics and Technology, Singapore University of Technology and Design, Singapore 487372, Singapore
Christopher Johann Clarke: Science, Mathematics and Technology, Singapore University of Technology and Design, Singapore 487372, Singapore
Jan Melechovsky: Science, Mathematics and Technology, Singapore University of Technology and Design, Singapore 487372, Singapore
Cindy Ming Ying Lin: Science, Mathematics and Technology, Singapore University of Technology and Design, Singapore 487372, Singapore
Balamurali B. T.: Science, Mathematics and Technology, Singapore University of Technology and Design, Singapore 487372, Singapore
Jer-Ming Chen: Science, Mathematics and Technology, Singapore University of Technology and Design, Singapore 487372, Singapore

DOI: https://doi.org/10.3390/app13074244
Journal volume & issue: Vol. 13, no. 7
p. 4244

Abstract

Read online

Automated techniques to detect Alzheimer’s Dementia through the use of audio recordings of spontaneous speech are now available with varying degrees of reliability. Here, we present a systematic comparison across different modalities, granularities and machine learning models to guide in choosing the most effective tools. Specifically, we present a multi-modal approach (audio and text) for the automatic detection of Alzheimer’s Dementia from recordings of spontaneous speech. Sixteen features, including four feature extraction methods (Energy–Time plots, Keg of Text Analytics, Keg of Text Analytics-Extended and Speech to Silence ratio) not previously applied in this context were tested to determine their relative performance. These features encompass two modalities (audio vs. text) at two resolution scales (frame-level vs. file-level). We compared the accuracy resulting from these features and found that text-based classification outperformed audio-based classification with the best performance attaining 88.7%, surpassing other reports to-date relying on the same dataset. For text-based classification in particular, the best file-level feature performed 9.8% better than the frame-level feature. However, when comparing audio-based classification, the best frame-level feature performed 1.4% better than the best file-level feature. This multi-modal multi-model comparison at high- and low-resolution offers insights into which approach is most efficacious, depending on the sampling context. Such a comparison of the accuracy of Alzheimer’s Dementia classification using both frame-level and file-level granularities on audio and text modalities of different machine learning models on the same dataset has not been previously addressed. We also demonstrate that the subject’s speech captured in short time frames and their dynamics may contain enough inherent information to indicate the presence of dementia. Overall, such a systematic analysis facilitates the identification of Alzheimer’s Dementia quickly and non-invasively, potentially leading to more timely interventions and improved patient outcomes.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords