Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification

ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng

doi:10.11896/jsjkx.220600022

Jisuanji kexue (Sep 2022)

Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification

ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng

Affiliations

ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng: 1 Henan Institute of Advanced Technology,Zhengzhou University,Zhengzhou 450000,China ;2 National Key Laboratory of Pattern Recognition,Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China

DOI: https://doi.org/10.11896/jsjkx.220600022
Journal volume & issue: Vol. 49, no. 9
pp. 132 – 138

Abstract

Read online

The rapid development of the Internet and the continuous expansion of social media have brought a wealth of social event information,and the task of social event classification has become increasingly challenging.Making full use of image-level and text-level information is the key to social event classification.However,most of existing methods have the following limitations:1) Most of the existing multi-modal methods have an ideal assumption that the samples of each modality are sufficient and complete,but in real applications this assumption does not always hold and there will be cases where a certain modality of events is missing;2) Most methods simply concatenate image features and text features of social events to obtain multi-modal features to classify social events.To address these challenges,this paper proposes a dual variational multi-modal attention network(DVMAN) for social event classification to address the limitations of these existing methods.In the DVMAN network,this paper proposes a novel dual variational autoencoders network to generate public representations of social events and further reconstruct the missing modal information in incomplete social event learning.Through distribution alignment and cross-reconstruction alignment,image and text latent representations are doubly aligned to mitigate the gap between different modalities,and for the mis-sing modality information,a generative model is utilized to synthesize its latent representations.In addition,this paper designs a multi-modal fusion module to integrate the fine-grained information of images and texts of social events,so as to realize the complementation and enhancement of information between modalities.This paper conducts extensive experiments on two publicly available event datasets,compared with the existing advanced methods,the accuracy of DVMAN improves by more than 4%.It demonstrates the superior performance of the proposed method for social event classification.

multi-modal|social event classification|social media|incomplete data learning

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords