Heuristic Attention Representation Learning for Self-Supervised Pretraining

Van Nhiem Tran; Shen-Hsuan Liu; Yung-Hui Li; Jia-Ching Wang

doi:10.3390/s22145169

Sensors (Jul 2022)

Heuristic Attention Representation Learning for Self-Supervised Pretraining

Van Nhiem Tran,
Shen-Hsuan Liu,
Yung-Hui Li,
Jia-Ching Wang

Affiliations

Van Nhiem Tran: Department of Computer Science and Information Engineering, National Central University, Taoyuan 3200, Taiwan
Shen-Hsuan Liu: Department of Computer Science and Information Engineering, National Central University, Taoyuan 3200, Taiwan
Yung-Hui Li: AI Research Center, Hon Hai Research Institute, Taipei 114699, Taiwan
Jia-Ching Wang: Department of Computer Science and Information Engineering, National Central University, Taoyuan 3200, Taiwan

DOI: https://doi.org/10.3390/s22145169
Journal volume & issue: Vol. 22, no. 14
p. 5169

Abstract

Read online

Recently, self-supervised learning methods have been shown to be very powerful and efficient for yielding robust representation learning by maximizing the similarity across different augmented views in embedding vector space. However, the main challenge is generating different views with random cropping; the semantic feature might exist differently across different views leading to inappropriately maximizing similarity objective. We tackle this problem by introducing Heuristic Attention Representation Learning (HARL). This self-supervised framework relies on the joint embedding architecture in which the two neural networks are trained to produce similar embedding for different augmented views of the same image. HARL framework adopts prior visual object-level attention by generating a heuristic mask proposal for each training image and maximizes the abstract object-level embedding on vector space instead of whole image representation from previous works. As a result, HARL extracts the quality semantic representation from each training sample and outperforms existing self-supervised baselines on several downstream tasks. In addition, we provide efficient techniques based on conventional computer vision and deep learning methods for generating heuristic mask proposals on natural image datasets. Our HARL achieves +1.3% advancement in the ImageNet semi-supervised learning benchmark and +0.9% improvement in AP50 of the COCO object detection task over the previous state-of-the-art method BYOL. Our code implementation is available for both TensorFlow and PyTorch frameworks.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords