Leveraging Deep Learning for Fine-Grained Categorization of Parkinson’s Disease Progression Levels through Analysis of Vocal Acoustic Patterns

Hadi Sedigh Malekroodi; Nuwan Madusanka; Byeong-il Lee; Myunggi Yi

doi:10.3390/bioengineering11030295

Bioengineering (Mar 2024)

Leveraging Deep Learning for Fine-Grained Categorization of Parkinson’s Disease Progression Levels through Analysis of Vocal Acoustic Patterns

Hadi Sedigh Malekroodi,
Nuwan Madusanka,
Byeong-il Lee,
Myunggi Yi

Affiliations

Hadi Sedigh Malekroodi: Industry 4.0 Convergence Bionics Engineering, Pukyong National University, Busan 48513, Republic of Korea
Nuwan Madusanka: Digital of Healthcare Research Center, Institute of Information Technology and Convergence, Pukyong National University, Busan 48513, Republic of Korea
Byeong-il Lee: Industry 4.0 Convergence Bionics Engineering, Pukyong National University, Busan 48513, Republic of Korea
Myunggi Yi: Industry 4.0 Convergence Bionics Engineering, Pukyong National University, Busan 48513, Republic of Korea

DOI: https://doi.org/10.3390/bioengineering11030295
Journal volume & issue: Vol. 11, no. 3
p. 295

Abstract

Read online

Speech impairments often emerge as one of the primary indicators of Parkinson’s disease (PD), albeit not readily apparent in its early stages. While previous studies focused predominantly on binary PD detection, this research explored the use of deep learning models to automatically classify sustained vowel recordings into healthy controls, mild PD, or severe PD based on motor symptom severity scores. Popular convolutional neural network (CNN) architectures, VGG and ResNet, as well as vision transformers, Swin, were fine-tuned on log mel spectrogram image representations of the segmented voice data. Furthermore, the research investigated the effects of audio segment lengths and specific vowel sounds on the performance of these models. The findings indicated that implementing longer segments yielded better performance. The models showed strong capability in distinguishing PD from healthy subjects, achieving over 95% precision. However, reliably discriminating between mild and severe PD cases remained challenging. The VGG16 achieved the best overall classification performance with 91.8% accuracy and the largest area under the ROC curve. Furthermore, focusing analysis on the vowel /u/ could further improve accuracy to 96%. Applying visualization techniques like Grad-CAM also highlighted how CNN models focused on localized spectrogram regions while transformers attended to more widespread patterns. Overall, this work showed the potential of deep learning for non-invasive screening and monitoring of PD progression from voice recordings, but larger multi-class labeled datasets are needed to further improve severity classification.

Published in Bioengineering

ISSN: 2306-5354 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology; Science: Biology (General)
Website: https://www.mdpi.com/journal/bioengineering

About the journal

Abstract

Keywords