CE-BART: Cause-and-Effect BART for Visual Commonsense Generation

Junyeong Kim; Ji Woo Hong; Sunjae Yoon; Chang D. Yoo

doi:10.3390/s22239399

Sensors (Dec 2022)

CE-BART: Cause-and-Effect BART for Visual Commonsense Generation

Junyeong Kim,
Ji Woo Hong,
Sunjae Yoon,
Chang D. Yoo

Affiliations

Junyeong Kim: Department of AI, Chung-Ang University, Seoul 06974, Republic of Korea
Ji Woo Hong: School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
Sunjae Yoon: School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea
Chang D. Yoo: School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea

DOI: https://doi.org/10.3390/s22239399
Journal volume & issue: Vol. 22, no. 23
p. 9399

Abstract

Read online

“A Picture is worth a thousand words”. Given an image, humans are able to deduce various cause-and-effect captions of past, current, and future events beyond the image. The task of visual commonsense generation has the aim of generating three cause-and-effect captions for a given image: (1) what needed to happen before, (2) what is the current intent, and (3) what will happen after. However, this task is challenging for machines, owing to two limitations: existing approaches (1) directly utilize conventional vision–language transformers to learn relationships between input modalities and (2) ignore relations among target cause-and-effect captions, but consider each caption independently. Herein, we propose Cause-and-Effect BART (CE-BART), which is based on (1) a structured graph reasoner that captures intra- and inter-modality relationships among visual and textual representations and (2) a cause-and-effect generator that generates cause-and-effect captions by considering the causal relations among inferences. We demonstrate the validity of CE-BART on the VisualCOMET and AVSD benchmarks. CE-BART achieved SOTA performance on both benchmarks, while an extensive ablation study and qualitative analysis demonstrated the performance gain and improved interpretability.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords