IMPACT OF VISUAL MODALITIES IN MULTIMODAL PERSONALITY AND AFFECTIVE COMPUTING

E. V. Ryumina; A. A. Karpov

doi:10.5194/isprs-archives-XLVIII-2-W3-2023-217-2023

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences (May 2023)

IMPACT OF VISUAL MODALITIES IN MULTIMODAL PERSONALITY AND AFFECTIVE COMPUTING

E. V. Ryumina,
A. A. Karpov

Affiliations

E. V. Ryumina: St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), St. Petersburg, Russian Federation
A. A. Karpov: St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS), St. Petersburg, Russian Federation

DOI: https://doi.org/10.5194/isprs-archives-XLVIII-2-W3-2023-217-2023
Journal volume & issue: Vol. XLVIII-2-W3-2023
pp. 217 – 224

Abstract

Read online

Personality and affective computing techniques play a significant role for better understanding of human’s behavior and intentions. Such techniques can be applied in practice in recommendation systems, healthcare, education, and job applicant screening. In this paper, we propose a novel multimodal approach to personality traits assessment that leverages affective features of human’s voice and face, as well as recent advances in the deep learning. We present a new mid-level modality fusion strategy that is based on a cross-modal attention mechanism with summarizing functionals. In contrast to other state-of-the-art approaches, we not only analyze a visual scene, but specifically process human’s upper body (selfie) and a scene background. Our experiments show that the Extroversion personality trait is better estimated by fusing visual scene, face, and audio (voice) modalities, while the Conscientiousness and Agreeableness traits are better assessed by fusing face, selfie, and audio modalities. Furthermore, our results show that utilizing the selfie modality outperforms the visual scene modality by more than 1% in terms of the Concordance Correlation Coefficient. Additionally, our approach based on processing three modalities (selfie, face, and audio) is on-par with other known state-of-the-art approaches that employ at least four modalities on the test set of the ChaLearn First Impressions V2 corpus.

Published in The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISSN: 1682-1750 (Print); 2194-9034 (Online)
Publisher: Copernicus Publications
Country of publisher: Germany
LCC subjects: Technology: Engineering (General). Civil engineering (General): Applied optics. Photonics
Website: http://www.isprs.org/publications/archives.aspx

About the journal