IEEE Access (Jan 2024)

Additive Cross-Modal Attention Network (ACMA) for Depression Detection Based on Audio and Textual Features

  • Ngumimi Karen Iyortsuun,
  • Soo-Hyung Kim,
  • Hyung-Jeong Yang,
  • Seung-Won Kim,
  • Min Jhon

DOI
https://doi.org/10.1109/ACCESS.2024.3362233
Journal volume & issue
Vol. 12
pp. 20479 – 20489

Abstract

Read online

Detecting depression involves using standardized questionnaires like the Patient Health Questionnaires (PHQ-8/9). Yet, patients might not always provide genuine responses, leading to potential misdiagnoses. Therefore, the need for a means to detect depression in patients without the use of preset questions is of high importance. Addressing this challenge, our study aims to discern telltale symptoms from statements made by the patient. We harness both audio and text data, proposing an Additive cross-modal attention network to learn and pick up the appropriate weights that best capture the cross-modal interactions and relationships between both features using BiLSTM as the backbone of both modalities. We tested our approach on the DAIC-WOZ dataset for depression detection and also evaluated our model performance on the EATD-Corpus. Benchmarked against similar studies on these datasets, our method demonstrates commendable efficacy in both classification and regression models for both unimodal and multimodal approaches. Our findings underscore the potential of our model to effectively detect depression in patients while using textual and speech modalities without the necessary use of preset questions for effective detection.

Keywords