Cross‐scale feature fusion connection for a YOLO detector

Zhongling Ruan; Hao Wang; Jianzhong Cao; Hongbo Zhang

doi:10.1049/cvi2.12069

IET Computer Vision (Mar 2022)

Cross‐scale feature fusion connection for a YOLO detector

Zhongling Ruan,
Hao Wang,
Jianzhong Cao,
Hongbo Zhang

Affiliations

Zhongling Ruan: Xi`an Institute of Optics and Precision Mechanics of CAS Xi`an China
Hao Wang: Xi`an Institute of Optics and Precision Mechanics of CAS Xi`an China
Jianzhong Cao: Xi`an Institute of Optics and Precision Mechanics of CAS Xi`an China
Hongbo Zhang: China Astronaut Research and training center Beijing China

DOI: https://doi.org/10.1049/cvi2.12069
Journal volume & issue: Vol. 16, no. 2
pp. 99 – 110

Abstract

Read online

Abstract Multi‐scale feature fusion is often used to address the issue of scale variations in object detection. However, most of the proposed network architectures only combine the features of two adjacent levels sequentially, so the first fusion nodes in both top‐down and bottom‐up pathways must be blank nodes that only have one input with no feature fusion. In this work, cross‐scale feature fusion connection (CFFC) is proposed which aims to enhance the entire feature hierarchy by propagating the features of each level more efficiently. The proposed method reuses and aggregates all the features of other scales to the blank nodes in both top‐down and bottom‐up pathways. Furthermore, the authors remove the 1 × 1 convolutional layer and replace the shortcut with concatenation before fusing multiple features. These concatenated feature maps are then supervised by the channel attention block at the fusion nodes. This modification allows the network to learn the important degree of each level in concatenated feature maps along the channel dimension. It is also observed that the proposed method alleviates the inconsistency in feature pyramids with fewer parameters. The performance of a YOLO object detector equipped with the proposed method on the COCO test‐dev 2017 is evaluated. The results show that the proposed method outperforms other architectures presented in the literature.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords