EDET: Entity Descriptor Encoder of Transformer for Multi-Modal Knowledge Graph in Scene Parsing

Sai Ma; Weibing Wan; Zedong Yu; Yuming Zhao

doi:10.3390/app13127115

Applied Sciences (Jun 2023)

EDET: Entity Descriptor Encoder of Transformer for Multi-Modal Knowledge Graph in Scene Parsing

Sai Ma,
Weibing Wan,
Zedong Yu,
Yuming Zhao

Affiliations

Sai Ma: Department of Computer, Shanghai University of Engineering Science, Shanghai 201620, China
Weibing Wan: Department of Computer, Shanghai University of Engineering Science, Shanghai 201620, China
Zedong Yu: Department of Computer, Shanghai University of Engineering Science, Shanghai 201620, China
Yuming Zhao: Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China

DOI: https://doi.org/10.3390/app13127115
Journal volume & issue: Vol. 13, no. 12
p. 7115

Abstract

Read online

In scene parsing, the model is required to be able to process complex multi-modal data such as images and contexts in real scenes, and discover their implicit connections from objects existing in the scene. As a storage method that contains entity information and the relationship between entities, a knowledge graph can well express objects and the semantic relationship between objects in the scene. In this paper, a new multi-phase process was proposed to solve scene parsing tasks; first, a knowledge graph was used to align the multi-modal information and then the graph-based model generates results. We also designed an experiment of feature engineering’s validation for a deep-learning model to preliminarily verify the effectiveness of this method. Hence, we proposed a knowledge representation method named Entity Descriptor Encoder of Transformer (EDET), which uses both the entity itself and its internal attributes for knowledge representation. This method can be embedded into the transformer structure to solve multi-modal scene parsing tasks. EDET can aggregate the multi-modal attributes of entities, and the results in the scene graph generation and image captioning tasks prove that EDET has excellent performance in multi-modal fields. Finally, the proposed method was applied to the industrial scene, which confirmed the viability of our method.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords