CAT: Learning to collaborate channel and spatial attention from multi‐information fusion

Zizhang Wu; Man Wang; Weiwei Sun; Yuchen Li; Tianhao Xu; Fan Wang; Keke Huang

doi:10.1049/cvi2.12166

IET Computer Vision (Apr 2023)

CAT: Learning to collaborate channel and spatial attention from multi‐information fusion

Zizhang Wu,
Man Wang,
Weiwei Sun,
Yuchen Li,
Tianhao Xu,
Fan Wang,
Keke Huang

Affiliations

Zizhang Wu: Zongmu Technology Shanghai China
Man Wang: Zongmu Technology Shanghai China
Weiwei Sun: Zongmu Technology Shanghai China
Yuchen Li: Zongmu Technology Shanghai China
Tianhao Xu: Zongmu Technology Shanghai China
Fan Wang: Zongmu Technology Shanghai China
Keke Huang: Central South University Changsha China

DOI: https://doi.org/10.1049/cvi2.12166
Journal volume & issue: Vol. 17, no. 3
pp. 309 – 318

Abstract

Read online

Abstract Channel and spatial attention mechanisms have proven to provide an evident performance boost of deep convolution neural networks. Most existing methods focus on one or run them parallel (series), neglecting the collaboration between the two attentions. In order to better establish the feature interaction between the two types of attentions, a plug‐and‐play attention module is proposed, which is termed as ‘CAT’—activating the Collaboration between spatial and channel Attentions based on learned Traits. Specifically, traits are represented as trainable coefficients (i.e. colla‐factors) to adaptively combine contributions of different attention modules to fit different image hierarchies and tasks better. Moreover, the global entropy pooling is proposed apart from global average pooling and global maximum pooling (GMP) operators, which is an effective component in suppressing noise signals by measuring the information disorder of feature maps. A three‐way pooling operation is introduced into attention modules and the adaptive mechanism is applied to fuse their outcomes. Extensive experiments on MS COCO, Pascal‐VOC, Cifar‐100, and ImageNet show that our CAT outperforms the existing state‐of‐the‐art attention mechanisms in object detection, instance segmentation, and image classification. The model and code will be released soon.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords