IEEE Access (Jan 2025)

Comprehensive Analysis of Masking Techniques in Molecular Graph Representation Learning

  • Bonyou Koo,
  • Sunyoung Kwon

DOI
https://doi.org/10.1109/ACCESS.2025.3531302
Journal volume & issue
Vol. 13
pp. 14290 – 14303

Abstract

Read online

Molecule representation learning is a primary area of focus in drug discovery and molecular property prediction. In previous studies, molecules have been modeled as graphs, enabling graph neural networks (GNNs) to capture essential structural information. Recent approaches have enhanced molecular representations by introducing advanced masking strategies, such as extending granularity from nodes to subgraphs, shifting masking locations, and applying masking during downstream tasks. However, comprehensive analyses of these strategies remain limited. In this study, we systematically evaluate masking techniques across various phases, granularities, locations, feature types, and ratios. Our findings reveal that node feature masking during pre-training achieves high performance, while rich features may reduce gains, and the commonly used 25% masking ratio is not universally optimal, with alternative ratios performing better depending on the dataset. Our study provides deeper insights into the benefits of masking techniques in molecular graphs and highlights their potential to improve semantic understanding and predictive accuracy in graph-based learning.

Keywords