Transformer-based transfer learning on self-reported voice recordings for Parkinson’s disease diagnosis

Ilias Tougui; Mehdi Zakroum; Ouassim Karrakchou; Mounir Ghogho

doi:10.1038/s41598-024-81824-x

Scientific Reports (Dec 2024)

Transformer-based transfer learning on self-reported voice recordings for Parkinson’s disease diagnosis

Ilias Tougui,
Mehdi Zakroum,
Ouassim Karrakchou,
Mounir Ghogho

Affiliations

Ilias Tougui: College of Engineering and Architecture - TICLab, International University of Rabat
Mehdi Zakroum: College of Engineering and Architecture - TICLab, International University of Rabat
Ouassim Karrakchou: College of Engineering and Architecture - TICLab, International University of Rabat
Mounir Ghogho: College of Engineering and Architecture - TICLab, International University of Rabat

DOI: https://doi.org/10.1038/s41598-024-81824-x
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 21

Abstract

Read online

Abstract Deep learning (DL) techniques are becoming more popular for diagnosing Parkinson’s disease (PD) because they offer non-invasive and easily accessible tools. By using advanced data analysis, these methods improve early detection and diagnosis, which is crucial for managing the disease effectively. This study explores end-to-end DL architectures, such as convolutional neural networks and transformers, for diagnosing PD using self-reported voice data collected via smartphones in everyday settings. Transfer learning was applied by starting with models pre-trained on large datasets from the image and the audio domains and then fine-tuning them on the mPower voice data. The Transformer model pre-trained on the voice data performed the best, achieving an average AUC of $$95.89\%$$ and an average AUPRC of $$87.11\%$$ , outperforming models trained from scratch. To the best of our knowledge, this is the first use of a Transformer model for audio data in PD diagnosis, using this dataset. We achieved better results than previous studies, whether they focused solely on the voice or incorporated multiple modalities, by relying only on the voice as a biomarker. These results show that using self-reported voice data with state-of-the-art DL architectures can significantly improve PD prediction and diagnosis, potentially leading to better patient outcomes.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords