Coreference resolution helps visual dialogs to focus

Tianwei Yue; Wenping Wang; Chen Liang; Dachi Chen; Congrui Hetang; Xuewei Wang

High-Confidence Computing (Jun 2024)

Coreference resolution helps visual dialogs to focus

Tianwei Yue,
Wenping Wang,
Chen Liang,
Dachi Chen,
Congrui Hetang,
Xuewei Wang

Affiliations

Tianwei Yue: Carnegie Mellon University, Pittsburgh 15213, USA
Wenping Wang: Corresponding author.; Carnegie Mellon University, Pittsburgh 15213, USA
Chen Liang: Carnegie Mellon University, Pittsburgh 15213, USA
Dachi Chen: Carnegie Mellon University, Pittsburgh 15213, USA
Congrui Hetang: Carnegie Mellon University, Pittsburgh 15213, USA
Xuewei Wang: Carnegie Mellon University, Pittsburgh 15213, USA

Journal volume & issue: Vol. 4, no. 2
p. 100184

Abstract

Read online

Visual Dialog is a multi-modal task involving both computer vision and dialog systems. The goal is to answer multiple questions in conversation style, given an image as the context. Neural networks with attention modules are widely used for this task, because of their effectiveness in reasoning the relevance between the texts and images. In this work, we study how to further improve the quality of such reasoning, which is an open challenge. Our baseline is the Recursive Visual Attention (RVA) model, which refines the vision-text attention by iteratively visiting the dialog history. Building on top of that, we propose to improve the attention mechanism with contrastive learning. We train a Matching-Aware Attention Kernel (MAAK) by aligning the deep feature embeddings of an image and its caption, to provide better attention scores. Experiments show consistent improvements from MAAK. In addition, we study the effect of using Multimodal Compact Bilinear (MCB) pooling as a three-way feature fusion for the visual, textual and dialog history embeddings. We analyze the performance of both methods in the discussion section, and propose further ideas to resolve current limitations.

Published in High-Confidence Computing

ISSN: 2667-2952 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/high-confidence-computing

About the journal

Abstract

Keywords