Attention Mechanism-Based Cognition-Level Scene Understanding

Xuejiao Tang; Wenbin Zhang

doi:10.3390/info16030203

Information (Mar 2025)

Attention Mechanism-Based Cognition-Level Scene Understanding

Xuejiao Tang,
Wenbin Zhang

Affiliations

Xuejiao Tang: Institute for Information Processing, Leibniz University Hannover, Welfengarten 1, 30167 Hannover, Germany
Wenbin Zhang: Knight Foundation School of Computing & Information Sciences, Florida International University, Miami, FL 33199, USA

DOI: https://doi.org/10.3390/info16030203
Journal volume & issue: Vol. 16, no. 3
p. 203

Abstract

Read online

Given a question–image input, a visual commonsense reasoning (VCR) model predicts an answer with a corresponding rationale, which requires inference abilities based on real-world knowledge. The VCR task, which calls for exploiting multi-source information as well as learning different levels of understanding and extensive commonsense knowledge, is a cognition-level scene understanding challenge. The VCR task has aroused researchers’ interests due to its wide range of applications, including visual question answering, automated vehicle systems, and clinical decision support. Previous approaches to solving the VCR task have generally relied on pre-training or exploiting memory with long-term dependency relationship-encoded models. However, these approaches suffer from a lack of generalizability and a loss of information in long sequences. In this work, we propose a parallel attention-based cognitive VCR network, termed PAVCR, which fuses visual–textual information efficiently and encodes semantic information in parallel to enable the model to capture rich information for cognition-level inference. Extensive experiments show that the proposed model yields significant improvements over existing methods on the benchmark VCR dataset. Moreover, the proposed model provides an intuitive interpretation of visual commonsense reasoning.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords