MCF-Net: Fusion Network of Facial and Scene Features for Expression Recognition in the Wild

Hui Xu; Jun Kong; Xiangqin Kong; Juan Li; Jianzhong Wang

doi:10.3390/app122010251

Applied Sciences (Oct 2022)

MCF-Net: Fusion Network of Facial and Scene Features for Expression Recognition in the Wild

Hui Xu,
Jun Kong,
Xiangqin Kong,
Juan Li,
Jianzhong Wang

Affiliations

Hui Xu: College of Information Science and Technology, Northeast Normal University, Changchun 130117, China
Jun Kong: College of Information Science and Technology, Northeast Normal University, Changchun 130117, China
Xiangqin Kong: College of Information Science and Technology, Northeast Normal University, Changchun 130117, China
Juan Li: School of Social Welfare, Changchun Humanities and Sciences College, Changchun 130117, China
Jianzhong Wang: College of Information Science and Technology, Northeast Normal University, Changchun 130117, China

DOI: https://doi.org/10.3390/app122010251
Journal volume & issue: Vol. 12, no. 20
p. 10251

Abstract

Read online

Nowadays, the facial expression recognition (FER) task has transitioned from a laboratory-controlled scenario to in-the-wild conditions. However, recognizing facial expressions in the wild is challenging due to factors such as variant backgrounds, low-quality facial images, and the subjectiveness of annotators. Therefore, deep neural networks have increasingly been leveraged to learn discriminative representations for FER. In this work, we propose the Multi-cues Fusion Net (MCF-Net), a novel deep learning model with a two-stream structure for FER. Our model first proposes a two-stream coding network to extract face and scene representations. Then, an adaptive fusion module is employed to fuse the two different representations for final recognition. In the face coding stream, a Sparse Mask Attention Learning (SMAL) module is utilized to adaptively generate the corresponding sparse face mask according to the input image. Meanwhile, we employ a Multi-scale Attention (MSA) module for extracting fine-grained feature subsets, which can obtain richer multi-scale interaction information. In the scene coding stream, a Relational Attention (RA) module is applied to construct the hidden relationship between the face and contextual features of non-facial regions by capturing the pairwise similarity. In order to verify the effectiveness and accuracy of our model, a large number of experiments are carried out on two public large-scale static facial expression image datasets, CAER-S and NCAER-S. By comparing the performance of our MCF-Net with other methods, the proposed model achieves superior results on two in-the-wild FER benchmarks: CAER-S with an accuracy of 81.82% and NCAER-S with an accuracy of 45.59%.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords