Jisuanji kexue (Aug 2025)
Hash Image Retrieval Based on Mixed Attention and Polarization Asymmetric Loss
Abstract
With the continuous development of the Internet,massive and complex image data is being created every day,so that today's mainstream social media is full of complex media data such as images.Effectively processing these image data can not only increase the utilization rate of image data but also improve the user experience.Therefore,how to retrieve images quickly and accurately has become a meaningful and urgent problem.The current mainstream hash image retrieval model is convolutional neural network model.However,the convolution operation of CNN can only capture local features,but cannot process global information,and the receptive field size of the convolution operation is fixed,it cannot adapt to input images of different scales.This paper proposes based on Swin Transformer model in Transformer model to achieve effective image retrieval.The Transformer model effectively solves the CNN problem with self-attention mechanism and location coding operation.However,the window attention module of the existing Swin-Transformer hashing image retrieval model gives the same weight to different channels of the image when extracting image features,thus ignoring the differences and dependencies of the feature information of different channels of the image,which reduces the availability of the extracted features and leads to a waste of computing resources.To solve these problems,this paper proposes hash image retrieval model based on mixed attention and polarization asymmetric loss.The model design is based on Swin-Transformer feature extraction module.The window self-attention module in HFST has been added to the channel attention block.The hash feature extraction module based on mixed attention is obtained,which enables the model to assign different weight information to the features of different channels of the input image.Increase the diversity of extracted features and maximize the use of computing resources.At the same time,in order to minimize the intra-class Hamming distance,maximize the inter-class Hamming distance,make full use of the supervision information of the data,and improve the retrieval accuracy of the image,this paper proposes polarization asymmetric loss function.The polarization loss and asymmetric loss are combined with a certain weight allocation ratio,so effectively improve the image retrieval precision.The experimental results show the validity and rationality of the proposed method.For example,when the hash coding length is 16 bits,the proposed model has a maximum average accuracy of 98.73% on the CIFAR-10 single-label dataset,which is 1.51% higher than that of the VTS16-CSQ model.The highest average retrieval accuracy mean is 90.65% on NUSWIDE multi-label dataset,which is 18.02% higher than TransHash and 5.92% higher than VTS16-CSQ model.
Keywords