Guided Spatial Transformers for Facial Expression Recognition

Cristina Luna-Jiménez; Jorge Cristóbal-Martín; Ricardo Kleinlein; Manuel Gil-Martín; José M. Moya; Fernando Fernández-Martínez

doi:10.3390/app11167217

Applied Sciences (Aug 2021)

Guided Spatial Transformers for Facial Expression Recognition

Cristina Luna-Jiménez,
Jorge Cristóbal-Martín,
Ricardo Kleinlein,
Manuel Gil-Martín,
José M. Moya,
Fernando Fernández-Martínez

Affiliations

Cristina Luna-Jiménez: Information Processing and Telecommunications Center, E.T.S.I. de Telecomunicación, Universidad Politécnica de Madrid, Avda. Complutense 30, 28040 Madrid, Spain
Jorge Cristóbal-Martín: Information Processing and Telecommunications Center, E.T.S.I. de Telecomunicación, Universidad Politécnica de Madrid, Avda. Complutense 30, 28040 Madrid, Spain
Ricardo Kleinlein: Information Processing and Telecommunications Center, E.T.S.I. de Telecomunicación, Universidad Politécnica de Madrid, Avda. Complutense 30, 28040 Madrid, Spain
Manuel Gil-Martín: Information Processing and Telecommunications Center, E.T.S.I. de Telecomunicación, Universidad Politécnica de Madrid, Avda. Complutense 30, 28040 Madrid, Spain
José M. Moya: Information Processing and Telecommunications Center, E.T.S.I. de Telecomunicación, Universidad Politécnica de Madrid, Avda. Complutense 30, 28040 Madrid, Spain
Fernando Fernández-Martínez: Information Processing and Telecommunications Center, E.T.S.I. de Telecomunicación, Universidad Politécnica de Madrid, Avda. Complutense 30, 28040 Madrid, Spain

DOI: https://doi.org/10.3390/app11167217
Journal volume & issue: Vol. 11, no. 16
p. 7217

Abstract

Read online

Spatial Transformer Networks are considered a powerful algorithm to learn the main areas of an image, but still, they could be more efficient by receiving images with embedded expert knowledge. This paper aims to improve the performance of conventional Spatial Transformers when applied to Facial Expression Recognition. Based on the Spatial Transformers’ capacity of spatial manipulation within networks, we propose different extensions to these models where effective attentional regions are captured employing facial landmarks or facial visual saliency maps. This specific attentional information is then hardcoded to guide the Spatial Transformers to learn the spatial transformations that best fit the proposed regions for better recognition results. For this study, we use two datasets: AffectNet and FER-2013. For AffectNet, we achieve a 0.35% point absolute improvement relative to the traditional Spatial Transformer, whereas for FER-2013, our solution gets an increase of 1.49% when models are fine-tuned with the Affectnet pre-trained weights.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords