DM2S2: Deep Multimodal Sequence Sets With Hierarchical Modality Attention

Shunsuke Kitada; Yuki Iwazaki; Riku Togashi; Hitoshi Iyatomi

doi:10.1109/ACCESS.2022.3221812

IEEE Access (Jan 2022)

DM2S2: Deep Multimodal Sequence Sets With Hierarchical Modality Attention

Shunsuke Kitada,
Yuki Iwazaki,
Riku Togashi,
Hitoshi Iyatomi

Affiliations

Shunsuke Kitada: ORCiD; Department of Applied Informatics, Graduate School of Science and Engineering, Hosei University, Tokyo, Japan
Yuki Iwazaki: CyberAgent Inc., Tokyo, Japan
Riku Togashi: CyberAgent Inc., Tokyo, Japan
Hitoshi Iyatomi: ORCiD; Department of Applied Informatics, Graduate School of Science and Engineering, Hosei University, Tokyo, Japan

DOI: https://doi.org/10.1109/ACCESS.2022.3221812
Journal volume & issue: Vol. 10
pp. 120023 – 120034

Abstract

Read online

There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM2S2). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter- and intra-order of elements in the sequences, (b) intra-modality residual attention (IntraMRA) to capture the importance of the elements in a modality, and (c) inter-modality residual attention (InterMRA) to enhance the importance of elements with modality-level granularity further. Our concept exhibits performance that is comparable to or better than the previous set-aware models. Furthermore, we demonstrate that the visualization of the learned InterMRA and IntraMRA weights can provide an interpretation of the prediction results.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords