Image-Text Joint Learning for Social Images with Spatial Relation Model

Jiangfan Feng; Xuejun Fu; Yao Zhou; Yuling Zhu; Xiaobo Luo

doi:10.1155/2020/1543947

Complexity (Jan 2020)

Image-Text Joint Learning for Social Images with Spatial Relation Model

Jiangfan Feng,
Xuejun Fu,
Yao Zhou,
Yuling Zhu,
Xiaobo Luo

Affiliations

Jiangfan Feng: Chongqing University of Posts and Telecommunications, College of Computer Science and Technology, Space Big Data Intelligent Technology Chongqing Engineering Research Center, Chongqing 400065, China
Xuejun Fu: Chongqing University of Posts and Telecommunications, College of Computer Science and Technology, Space Big Data Intelligent Technology Chongqing Engineering Research Center, Chongqing 400065, China
Yao Zhou: Chongqing University of Posts and Telecommunications, College of Computer Science and Technology, Space Big Data Intelligent Technology Chongqing Engineering Research Center, Chongqing 400065, China
Yuling Zhu: Chongqing University of Posts and Telecommunications, College of Computer Science and Technology, Space Big Data Intelligent Technology Chongqing Engineering Research Center, Chongqing 400065, China
Xiaobo Luo: Chongqing University of Posts and Telecommunications, College of Computer Science and Technology, Space Big Data Intelligent Technology Chongqing Engineering Research Center, Chongqing 400065, China

DOI: https://doi.org/10.1155/2020/1543947
Journal volume & issue: Vol. 2020

Abstract

Read online

The rapid developments in sensor technology and mobile devices bring a flourish of social images, and large-scale social images have attracted increasing attention to researchers. Existing approaches generally rely on recognizing object instances individually with geo-tags, visual patterns, etc. However, the social image represents a web of interconnected relations; these relations between entities carry semantic meaning and help a viewer differentiate between instances of a substance. This article forms the perspective of the spatial relationship to exploring the joint learning of social images. Precisely, the model consists of three parts: (a) a module for deep semantic understanding of images based on residual network (ResNet); (b) a deep semantic analysis module of text beyond traditional word bag methods; (c) a joint reasoning module from which the text weights obtained using image features on self-attention and a novel tree-based clustering algorithm. The experimental results demonstrate the effectiveness of using Flickr30k and Microsoft COCO datasets. Meanwhile, our method considers spatial relations while matching.

Published in Complexity

ISSN: 1076-2787 (Print); 1099-0526 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://onlinelibrary.wiley.com/journal/8503

About the journal