Depression recognition using voice-based pre-training model

Xiangsheng Huang; Fang Wang; Yuan Gao; Yilong Liao; Wenjing Zhang; Li Zhang; Zhenrong Xu

doi:10.1038/s41598-024-63556-0

Scientific Reports (Jun 2024)

Depression recognition using voice-based pre-training model

Xiangsheng Huang,
Fang Wang,
Yuan Gao,
Yilong Liao,
Wenjing Zhang,
Li Zhang,
Zhenrong Xu

Affiliations

Xiangsheng Huang: School of Biomedical Engineering, South-Central Minzu University
Fang Wang: School of Biomedical Engineering, South-Central Minzu University
Yuan Gao: School of Biomedical Engineering, South-Central Minzu University
Yilong Liao: School of Biomedical Engineering, South-Central Minzu University
Wenjing Zhang: School of Biomedical Engineering, South-Central Minzu University
Li Zhang: School of Biomedical Engineering, South-Central Minzu University
Zhenrong Xu: School of Biomedical Engineering, South-Central Minzu University

DOI: https://doi.org/10.1038/s41598-024-63556-0
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 13

Abstract

Read online

Abstract The early screening of depression is highly beneficial for patients to obtain better diagnosis and treatment. While the effectiveness of utilizing voice data for depression detection has been demonstrated, the issue of insufficient dataset size remains unresolved. Therefore, we propose an artificial intelligence method to effectively identify depression. The wav2vec 2.0 voice-based pre-training model was used as a feature extractor to automatically extract high-quality voice features from raw audio. Additionally, a small fine-tuning network was used as a classification model to output depression classification results. Subsequently, the proposed model was fine-tuned on the DAIC-WOZ dataset and achieved excellent classification results. Notably, the model demonstrated outstanding performance in binary classification, attaining an accuracy of 0.9649 and an RMSE of 0.1875 on the test set. Similarly, impressive results were obtained in multi-classification, with an accuracy of 0.9481 and an RMSE of 0.3810. The wav2vec 2.0 model was first used for depression recognition and showed strong generalization ability. The method is simple, practical, and applicable, which can assist doctors in the early screening of depression.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords