Life (Oct 2024)
TCEDN: A Lightweight Time-Context Enhanced Depression Detection Network
Abstract
The automatic video recognition of depression is becoming increasingly important in clinical applications. However, traditional depression recognition models still face challenges in practical applications, such as high computational costs, the poor application effectiveness of facial movement features, and spatial feature degradation due to model stitching. To overcome these challenges, this work proposes a lightweight Time-Context Enhanced Depression Detection Network (TCEDN). We first use attention-weighted blocks to aggregate and enhance video frame-level features, easing the model’s computational workload. Next, by integrating the temporal and spatial changes of video raw features and facial movement features in a self-learning weight manner, we enhance the precision of depression detection. Finally, a fusion network of 3-Dimensional Convolutional Neural Network (3D-CNN) and Convolutional Long Short-Term Memory Network (ConvLSTM) is constructed to minimize spatial feature loss by avoiding feature flattening and to achieve depression score prediction. Tests on the AVEC2013 and AVEC2014 datasets reveal that our approach yields results on par with state-of-the-art techniques for detecting depression using video analysis. Additionally, our method has significantly lower computational complexity than mainstream methods.
Keywords