Expectation-Maximization via Pretext-Invariant Representations

Chingis Oinar; Binh M. Le; Simon S. Woo

doi:10.1109/ACCESS.2023.3289589

IEEE Access (Jan 2023)

Expectation-Maximization via Pretext-Invariant Representations

Chingis Oinar,
Binh M. Le,
Simon S. Woo

Affiliations

Chingis Oinar: ORCiD; Mercari Inc., Tokyo, Japan
Binh M. Le: ORCiD; Department of Computer Science and Engineering, College of Computing and Informatics, Sungkyunkwan University, Suwon, South Korea
Simon S. Woo: Department of Computer Science and Engineering, College of Computing and Informatics, Sungkyunkwan University, Suwon, South Korea

DOI: https://doi.org/10.1109/ACCESS.2023.3289589
Journal volume & issue: Vol. 11
pp. 65266 – 65276

Abstract

Read online

Contrastive learning methods have been widely adopted in numerous unsupervised and self-supervised visual representation learning methods. Such algorithms aim to maximize the cosine similarity between two positive samples while minimizing that of the negative samples. Recently, Grill et al. propose an algorithm, BYOL, to utilize only positive samples, completely giving up on negative ones, by introducing a Siamese-like asymmetric architecture. Although many recent state-of-the-art (SOTA) methods adopt the architecture, most of them simply introduce the additional neural network, the predictor, without much exploration of the asymmetrical architecture. In contrast, He et al. propose SimSiam, a simple Siamese architecture relying on the stop-gradient operation instead of the momentum encoder and describe the framework from the perspective of Expectation-Maximization. We argue that BYOL-like algorithms attain suboptimal performance due to representation inconsistency during training. In this work, we explain and propose a novel self-supervised objective, Expectation-Maximization via Pretext-Invariant Representations (EMPIR), which enhances Expectation-Maximization-based optimization in BYOL-like algorithms by enforcing augmentation invariance within a local region of k nearest neighbors, resulting in consistent representation learning. In other words, we propose Expectation-Maximization as a core task of asymmetric architectures. We show that it consistently outperforms other (SOTA) algorithms by a decent margin. We also demonstrate its transfer learning capabilities on downstream image recognition tasks.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords