IEEE Access (Jan 2024)
Multimodal Stress Recognition Using a Multimodal Neglecting Mask Module
Abstract
The need for stress recognition research is increasing with the increase in the need for proactively managing stress, which greatly impacts overall health. If stress levels can be measured through these stress perception studies, it is thought that the repulsion that stress measurement equipment can give can be minimized, and efficient management can be performed as an auxiliary means of disease management through stress. Although many studies are being conducted on stress recognition using physiological signals, the equipment required to acquire these signals generates additional costs and is inconvenient for the wearer when worn continuously. By contrast, studies using facial images for recognizing stress use noncontact methods. However, these methods have a disadvantage, that is, if a subject does not demonstrate major changes in their facial expressions, recognizing their stress state is difficult. In this study, we propose a stress-recognition method using both facial images and speech to overcome the aforementioned problems. By using speech signals, the problems that occur when using facial images only can be overcome. In the proposed method, the modality models of image and speech are optimized to be suitable for stress recognition. Then, in the network of each modality model, the feature maps of the middle layer are combined with a multimodal neglecting mask module. The two models are efficiently combined by learning the parts to be neglected. The proposed method used the multimodal neglecting mask module (MNMM) to extract features that are more relevant to stress recognition compared to previous methods. The experimental results confirm that the proposed method exhibits the highest performance of 78.1246%, among those exhibited by other reported methods when classifying the stress states into three classes, namely, neutral, low stress, and high stress.
Keywords