MSFA: Multi‐stage feature aggregation network for multi‐label image recognition

Jiale Chen; Feng Xu; Tao Zeng; Xin Li; Shangjing Chen; Jie Yu

doi:10.1049/ipr2.13068

IET Image Processing (May 2024)

MSFA: Multi‐stage feature aggregation network for multi‐label image recognition

Jiale Chen,
Feng Xu,
Tao Zeng,
Xin Li,
Shangjing Chen,
Jie Yu

Affiliations

Jiale Chen: College of Computer Science and Software Engineering Hohai University Nanjing People's Republic of China
Feng Xu: College of Computer Science and Software Engineering Hohai University Nanjing People's Republic of China
Tao Zeng: College of Computer Science and Software Engineering Hohai University Nanjing People's Republic of China
Xin Li: College of Computer Science and Software Engineering Hohai University Nanjing People's Republic of China
Shangjing Chen: College of Computer Science and Software Engineering Hohai University Nanjing People's Republic of China
Jie Yu: College of Computer Science and Software Engineering Hohai University Nanjing People's Republic of China

DOI: https://doi.org/10.1049/ipr2.13068
Journal volume & issue: Vol. 18, no. 7
pp. 1862 – 1877

Abstract

Read online

Abstract Multi‐label image recognition (MLR) is a significant branch of image classification that aims to assign multiple categorical labels to each input. Previous research has focused on enhancing the learning of category‐related regional features. However, the potential impact of multi‐scale distributions in intra‐ and inter‐category targets on MLR tends to be neglected. Besides, semantic consistency for categories is restricted to be considered on single‐scale features, resulting in suboptimal feature extraction. To address the limitations of above, a Multi‐stage Feature Aggregation (MSFA) network is proposed. In MSFA, a novel local feature extraction method is suggested to progressively extract category‐related high‐resolution local features in both spatial and channel dimensions. Subsequently, local and global features are fused without additional up‐ and down‐sampling to enrich the scale diversity of the features while incorporating refined class‐specific information. Furthermore, a hierarchical prediction scheme for MLR is proposed, which generates classification confidence corresponding to different scales under hierarchical loss supervision. Consequently, the final output of the network comes from the joint prediction by the classifiers on multi‐scale features, ensuring a stronger feature extraction capability. The extensive experiments have been carried on VOC and MS‐COCO datasets, and the superiority of MSFA over existing mainstream methods has been verified.

Published in IET Image Processing

ISSN: 1751-9659 (Print); 1751-9667 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Photography; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519667

About the journal

Abstract

Keywords