Long-Tailed Visual Recognition via Improved Cross-Window Self-Attention and TrivialAugment

Ying Song; Mengxing Li; Bo Wang

doi:10.1109/access.2023.3277204

IEEE Access (Jan 2023)

Long-Tailed Visual Recognition via Improved Cross-Window Self-Attention and TrivialAugment

Ying Song,
Mengxing Li,
Bo Wang

Affiliations

Ying Song: Beijing Key Laboratory of Internet Culture and Digital Dissemination, Beijing Information Science and Technology University, Beijing, China
Mengxing Li: ORCiD; Beijing Key Laboratory of Internet Culture and Digital Dissemination, Beijing Information Science and Technology University, Beijing, China
Bo Wang: ORCiD; Software Engineering College, Zhengzhou University of Light Industry, Zhengzhou, China

DOI: https://doi.org/10.1109/access.2023.3277204
Journal volume & issue: Vol. 11
pp. 49601 – 49610

Abstract

Read online

In the real world, large-scale image data sets usually present long-tailed distribution. When traditional visual recognition methods are applied to long-tail image data sets, problems such as model failure and sudden decline in recognition accuracy occur. While, when deep learning models encounter long-tailed datasets, they tend to perform poorly. In order to mitigate the impact of these problems, we propose CWTA (Long-tailed Visual Recognition via improved Cross-Window Self-Attention and TrivialAugment). CWTA uses CNN to better capture the local features of the image, uses the Cross-Window Self-Attention mechanism to dynamically adjust the perception domain to better deal with image noise, and uses TrivialAugment to enhance the diversity of a few types of data samples, thus improving the recognition accuracy of long-tailed distributed images. The experimental results show that the proposed CWTA performs best in the classification accuracy of different categories on different long-tailed datasets. We also compared CWTA with other long-tailed recognition algorithms (such as OLTR, LWS, ResLT, PaCo, and BALLAD), and the CWTA is the best when ResNet-50 as the Backbone. On the CIFAR100-LT, ImageNet-LT, and Places-LT datasets, the acc of all categories of CWTA is 12.9%, 0.4%, and 1.3% higher than that of BALLAD, respectively. For F1-Score on CIFAR100-LT, ImageNet-LT, and Places-LT datasets, CWTA is 6.6%, 2.2%, and 1.5% higher than BALLAD, respectively.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords