Conditional selection with CNN augmented transformer for multimodal affective analysis

Jianwen Wang; Shiping Wang; Shunxin Xiao; Renjie Lin; Mianxiong Dong; Wenzhong Guo

doi:10.1049/cit2.12320

CAAI Transactions on Intelligence Technology (Aug 2024)

Conditional selection with CNN augmented transformer for multimodal affective analysis

Jianwen Wang,
Shiping Wang,
Shunxin Xiao,
Renjie Lin,
Mianxiong Dong,
Wenzhong Guo

Affiliations

Jianwen Wang: College of Computer and Data Science Fuzhou University Fuzhou China
Shiping Wang: College of Computer and Data Science Fuzhou University Fuzhou China
Shunxin Xiao: College of Computer and Data Science Fuzhou University Fuzhou China
Renjie Lin: College of Computer and Data Science Fuzhou University Fuzhou China
Mianxiong Dong: Department of Sciences and Informatics Muroran Institute of Technology Muroran Japan
Wenzhong Guo: College of Computer and Data Science Fuzhou University Fuzhou China

DOI: https://doi.org/10.1049/cit2.12320
Journal volume & issue: Vol. 9, no. 4
pp. 917 – 931

Abstract

Read online

Abstract Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information. One is to generate sparse attention coefficients associated with acoustic and visual modalities, which helps locate critical emotional semantics. The other is fusing complementary cross‐modal representation to construct optimal salient feature combinations of multiple modalities. A Conditional Transformer Fusion Network is proposed to handle these problems. Firstly, the authors equip the transformer module with CNN layers to enhance the detection of subtle signal patterns in nonverbal sequences. Secondly, sentiment words are utilised as context conditions to guide the computation of cross‐modal attention. As a result, the located nonverbal features are not only salient but also complementary to sentiment words directly. Experimental results show that the authors’ method achieves state‐of‐the‐art performance on several multimodal affective analysis datasets.

Published in CAAI Transactions on Intelligence Technology

ISSN: 2468-2322 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/24682322

About the journal

Abstract

Keywords