Masked Autoencoders in Computer Vision: A Comprehensive Survey

Zexian Zhou; Xiaojing Liu

doi:10.1109/ACCESS.2023.3323383

IEEE Access (Jan 2023)

Masked Autoencoders in Computer Vision: A Comprehensive Survey

Zexian Zhou,
Xiaojing Liu

Affiliations

Zexian Zhou: ORCiD; Department of Computer Technology and Application, Qinghai University, Xining, China
Xiaojing Liu: ORCiD; Department of Computer Technology and Application, Qinghai University, Xining, China

DOI: https://doi.org/10.1109/ACCESS.2023.3323383
Journal volume & issue: Vol. 11
pp. 113560 – 113579

Abstract

Read online

Masked autoencoders (MAE) is a deep learning method based on Transformer. Originally used for images, it has now been extended to video, audio, and some other temporal prediction tasks. In the field of computer vision, MAE performs well in classification, prediction, and target detection tasks. In terms of specific application, MAE has made many achievements in medical treatment, geography, 3D point cloud and machine troubleshooting. Since its introduction at the end of 2021, there have been more than 300 related preprints, and MAE has been significantly performed in tier one computer vision conferences during 2022 and 2023. In view of the current popularity of MAE and its future development prospects, we conduct a relatively comprehensive survey of MAE mainly covering officially published articles so far. We comb through and classify the improvements in MAE, demonstrating relatively representative applications in computer vision. Finally, as a summary, we discuss the possible future research directions and development areas based on the characteristics of MAE, hoping our work could be a reference for the future work of MAE.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords