SE‐Swin: An improved Swin‐Transfomer network of self‐ensemble feature extraction framework for image retrieval

Yixuan Xu; Xianbing Wang; Hua Zhang; Hai Lin

doi:10.1049/ipr2.12929

IET Image Processing (Jan 2024)

SE‐Swin: An improved Swin‐Transfomer network of self‐ensemble feature extraction framework for image retrieval

Yixuan Xu,
Xianbing Wang,
Hua Zhang,
Hai Lin

Affiliations

Yixuan Xu: School of Cyber Science and Engineering Wuhan University Wuhan China
Xianbing Wang: School of Cyber Science and Engineering Wuhan University Wuhan China
Hua Zhang: School of Computer Science Wuhan University Wuhan China
Hai Lin: School of Cyber Science and Engineering Wuhan University Wuhan China

DOI: https://doi.org/10.1049/ipr2.12929
Journal volume & issue: Vol. 18, no. 1
pp. 13 – 21

Abstract

Read online

Abstract The Swin‐Transformer is a variant of the Vision Transformer, which constructs a hierarchical Transformer that computes representations with shifted windows and window multi‐head self‐attention. This method can handle the scale invariance problem and performs well in many computer vision tasks. In image retrieval, high‐quality feature descriptors are necessary to improve retrieval accuracy. This paper proposes a self‐ensemble Swin‐Transformer network structure to fuse the features of different layers of the Swin‐Transformer network, eliminating noise points present in a single layer, and improving the retrieval effect. Two experiments were conducted, one on the In‐shop Clothes Retrieval dataset and another on the Stanford Online Product dataset. The experiments showed that the proposed method significantly increased the retrieval effect of features extracted using Vision Transformer, surpassing previous state‐of‐the‐art image retrieval methods. In the second experiment, the feature map of the trained model was visualized, revealing that the improved network significantly reduces focus on some noise points and enhances focus on image features compared to the original network.

Published in IET Image Processing

ISSN: 1751-9659 (Print); 1751-9667 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Photography; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519667

About the journal

Abstract

Keywords