Harnessing multimodal approaches for depression detection using large language models and facial expressions

Misha Sadeghi; Robert Richer; Bernhard Egger; Lena Schindler-Gmelch; Lydia Helene Rupp; Farnaz Rahimi; Matthias Berking; Bjoern M. Eskofier

doi:10.1038/s44184-024-00112-8

npj Mental Health Research (Dec 2024)

Harnessing multimodal approaches for depression detection using large language models and facial expressions

Misha Sadeghi,
Robert Richer,
Bernhard Egger,
Lena Schindler-Gmelch,
Lydia Helene Rupp,
Farnaz Rahimi,
Matthias Berking,
Bjoern M. Eskofier

Affiliations

Misha Sadeghi: Machine Learning and Data Analytics Lab (MaD Lab), Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
Robert Richer: Machine Learning and Data Analytics Lab (MaD Lab), Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
Bernhard Egger: Chair of Visual Computing (LGDV), Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
Lena Schindler-Gmelch: Chair of Clinical Psychology and Psychotherapy (KliPs), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
Lydia Helene Rupp: Chair of Clinical Psychology and Psychotherapy (KliPs), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
Farnaz Rahimi: Machine Learning and Data Analytics Lab (MaD Lab), Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
Matthias Berking: Chair of Clinical Psychology and Psychotherapy (KliPs), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)
Bjoern M. Eskofier: Machine Learning and Data Analytics Lab (MaD Lab), Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)

DOI: https://doi.org/10.1038/s44184-024-00112-8
Journal volume & issue: Vol. 3, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Detecting depression is a critical component of mental health diagnosis, and accurate assessment is essential for effective treatment. This study introduces a novel, fully automated approach to predicting depression severity using the E-DAIC dataset. We employ Large Language Models (LLMs) to extract depression-related indicators from interview transcripts, utilizing the Patient Health Questionnaire-8 (PHQ-8) score to train the prediction model. Additionally, facial data extracted from video frames is integrated with textual data to create a multimodal model for depression severity prediction. We evaluate three approaches: text-based features, facial features, and a combination of both. Our findings show the best results are achieved by enhancing text data with speech quality assessment, with a mean absolute error of 2.85 and root mean square error of 4.02. This study underscores the potential of automated depression detection, showing text-only models as robust and effective while paving the way for multimodal analysis.

Published in npj Mental Health Research

ISSN: 2731-4251 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry: Neurology. Diseases of the nervous system: Psychiatry: Therapeutics. Psychotherapy
Website: https://www.nature.com/npjmentalhealth/

About the journal