Attention-Based Fusion of Ultrashort Voice Utterances and Depth Videos for Multimodal Person Identification

Abderrazzaq Moufidi; David Rousseau; Pejman Rasti

doi:10.3390/s23135890

Sensors (Jun 2023)

Attention-Based Fusion of Ultrashort Voice Utterances and Depth Videos for Multimodal Person Identification

Abderrazzaq Moufidi,
David Rousseau,
Pejman Rasti

Affiliations

Abderrazzaq Moufidi: Centre d’Études et de Recherche pour l’Aide à la Décision (CERADE), ESAIP, 18 Rue du 8 Mai 1945, 49124 Saint-Barthélemy-d’Anjou, France
David Rousseau: Laboratoire Angevin de Recherche en Ingénierie des Systèmes (LARIS), UMR INRAe-IRHS, Université d’Angers, 62 Avenue Notre Dame du Lac, 49000 Angers, France
Pejman Rasti: Centre d’Études et de Recherche pour l’Aide à la Décision (CERADE), ESAIP, 18 Rue du 8 Mai 1945, 49124 Saint-Barthélemy-d’Anjou, France

DOI: https://doi.org/10.3390/s23135890
Journal volume & issue: Vol. 23, no. 13
p. 5890

Abstract

Read online

Multimodal deep learning, in the context of biometrics, encounters significant challenges due to the dependence on long speech utterances and RGB images, which are often impractical in certain situations. This paper presents a novel solution addressing these issues by leveraging ultrashort voice utterances and depth videos of the lip for person identification. The proposed method utilizes an amalgamation of residual neural networks to encode depth videos and a Time Delay Neural Network architecture to encode voice signals. In an effort to fuse information from these different modalities, we integrate self-attention and engineer a noise-resistant model that effectively manages diverse types of noise. Through rigorous testing on a benchmark dataset, our approach exhibits superior performance over existing methods, resulting in an average improvement of 10%. This method is notably efficient for scenarios where extended utterances and RGB images are unfeasible or unattainable. Furthermore, its potential extends to various multimodal applications beyond just person identification.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords