Class-dependent and cross-modal memory network considering sentimental features for video-based captioning

Haitao Xiong; Yuchen Zhou; Jiaming Liu; Yuanyuan Cai; Yuanyuan Cai

doi:10.3389/fpsyg.2023.1124369

Frontiers in Psychology (Feb 2023)

Class-dependent and cross-modal memory network considering sentimental features for video-based captioning

Haitao Xiong,
Yuchen Zhou,
Jiaming Liu,
Yuanyuan Cai,
Yuanyuan Cai

Affiliations

Haitao Xiong: School of International Economics and Management, Beijing Technology and Business University, Beijing, China
Yuchen Zhou: School of International Economics and Management, Beijing Technology and Business University, Beijing, China
Jiaming Liu: School of International Economics and Management, Beijing Technology and Business University, Beijing, China
Yuanyuan Cai: National Engineering Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing, China
Yuanyuan Cai: School of E-Business and Logistics, Beijing Technology and Business University, Beijing, China

DOI: https://doi.org/10.3389/fpsyg.2023.1124369
Journal volume & issue: Vol. 14

Abstract

Read online

The video-based commonsense captioning task aims to add multiple commonsense descriptions to video captions to understand video content better. This paper aims to consider the importance of cross-modal mapping. We propose a combined framework called Class-dependent and Cross-modal Memory Network considering SENtimental features (CCMN-SEN) for Video-based Captioning to enhance commonsense caption generation. Firstly, we develop class-dependent memory for recording the alignment between video features and text. It only allows cross-modal interactions and generation on cross-modal matrices that share the same labels. Then, to understand the sentiments conveyed in the videos and generate accurate captions, we add sentiment features to facilitate commonsense caption generation. Experiment results demonstrate that our proposed CCMN-SEN significantly outperforms the state-of-the-art methods. These results have practical significance for understanding video content better.

Published in Frontiers in Psychology

ISSN: 1664-1078 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Philosophy. Psychology. Religion: Psychology
Website: https://www.frontiersin.org/journals/psychology

About the journal

Abstract

Keywords