IET Image Processing (Oct 2023)
Multi‐modal object detection via transformer network
Abstract
Abstract According to the fact that single‐modal data usually contain limited information, a great deal of effort has been devoted to making use of the complementary information contained in the multi‐modal data on various patterns. Thus, this paper is concerned with an object detection method that can fully utilize multi‐modal data. First, the method introduces the transformer mechanism to realize the fusion of intra‐modal and inter‐modal features of different modal data. The aim is to take advantage of the complementarity of data between modalities, which helps to improve the performance of multi‐modal object detection. Second, a contrastive loss suitable for contrastive learning is applied. This enables the authors to effectively utilize label information. Extensive experiments are conducted on multiple object detection datasets to demonstrate the effectiveness of our proposed method.
Keywords