HAT: A Visual Transformer Model for Image Recognition Based on Hierarchical Attention Transformation

Xuanyu Zhao; Tao Hu; Chunxia Mao; Ye Yuan; Jun Li

doi:10.1109/ACCESS.2023.3314573

IEEE Access (Jan 2023)

HAT: A Visual Transformer Model for Image Recognition Based on Hierarchical Attention Transformation

Xuanyu Zhao,
Tao Hu,
Chunxia Mao,
Ye Yuan,
Jun Li

Affiliations

Xuanyu Zhao: College of Intelligent Systems Science and Engineering, Hubei Minzu University, Enshi, China
Tao Hu: ORCiD; College of Intelligent Systems Science and Engineering, Hubei Minzu University, Enshi, China
Chunxia Mao: College of Intelligent Systems Science and Engineering, Hubei Minzu University, Enshi, China
Ye Yuan: Enshi Audit Office, Enshi, China
Jun Li: ORCiD; College of Intelligent Systems Science and Engineering, Hubei Minzu University, Enshi, China

DOI: https://doi.org/10.1109/ACCESS.2023.3314573
Journal volume & issue: Vol. 11
pp. 100042 – 100051

Abstract

Read online

In the field of image recognition, Visual Transformer (ViT) has excellent performance. However, ViT, relies on a fixed self-attentive layer, tends to lead to computational redundancy and makes it difficult to maintain the integrity of the image convolutional feature sequence during the training process. Therefore, we proposed a non-normalization hierarchical attention transfer network (HAT), which introduces threshold attention mechanism and multi head attention mechanism after pooling in each layer. The focus of HAT is shifted between local and global, thus flexibly controlling the attention range of image classification. The HAT used the smaller computational complexity to improve it’s scalability, which enables it to handle longer feature sequences and balance efficiency and accuracy. HAT removes layer normalization to increase the likelihood of convergence to an optimal level during training. In order to verify the effectiveness of the proposed model, we conducted experiments on image classification and segmentation tasks. The results shows that compared with classical pyramid structured networks and different attention networks, HAT outperformed the benchmark networks on both ImageNet and CIFAR100 datasets.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords