Image-Collection Summarization Using Scene-Graph Generation With External Knowledge

Itthisak Phueaksri; Marc A. Kastner; Yasutomo Kawanishi; Takahiro Komamizu; Ichiro Ide

doi:10.1109/ACCESS.2024.3360113

IEEE Access (Jan 2024)

Image-Collection Summarization Using Scene-Graph Generation With External Knowledge

Itthisak Phueaksri,
Marc A. Kastner,
Yasutomo Kawanishi,
Takahiro Komamizu,
Ichiro Ide

Affiliations

Itthisak Phueaksri: ORCiD; Graduate School of Informatics, Nagoya University, Aichi, Nagoya, Japan
Marc A. Kastner: ORCiD; Graduate School of Informatics, Kyoto University, Kyoto, Japan
Yasutomo Kawanishi: ORCiD; Graduate School of Informatics, Nagoya University, Aichi, Nagoya, Japan
Takahiro Komamizu: ORCiD; Graduate School of Informatics, Nagoya University, Aichi, Nagoya, Japan
Ichiro Ide: ORCiD; Graduate School of Informatics, Nagoya University, Aichi, Nagoya, Japan

DOI: https://doi.org/10.1109/ACCESS.2024.3360113
Journal volume & issue: Vol. 12
pp. 17499 – 17512

Abstract

Read online

Summarization tasks aim to summarize multiple pieces of information into a short description or representative information. A text summarization task summarizes textual information into a short description, whereas an image collection summarization task summarizes an image collection into images or textual representation in which the challenge is to understand the relationship between images. In recent years, scene-graph generation has shown the advantage of describing the visual contexts of a single-image, and incorporating external knowledge into the scene-graph generation model has also given effective directions for unseen single-image scene-graph generation. While external knowledge has been implemented in related work, it is still challenging to use this information efficiently for relationship estimation during the summarization. Following this trend, in this paper, we propose a novel scene-graph-based image-collection summarization model that aims to generate a summarized scene-graph of an image collection. The key idea of the proposed method is to enhance the relation predictor toward relationships between images in an image collection incorporating knowledge graphs as external knowledge for training a model. With this approach, we build an end-to-end framework that can generate a summarized scene graph of an image collection. To evaluate the proposed method, we also build an extended annotated MS-COCO dataset for this task and introduce an evaluation process that focuses on estimating the similarity between a summarized scene graph and ground-truth scene graphs. Traditional evaluation focuses on calculating precision and recall scores, which involve true positive predictions without balancing precision and recall. Meanwhile, the proposed evaluation process focuses on calculating the F-score of the similarity between a summarized scene graph and ground-truth scene graphs, which aims to balance both false positives and false negatives. Experimental results show that using external knowledge to enhance the relation predictor achieves better results than existing methods.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords