IEEE Access (Jan 2024)
stBERT: A Pretrained Model for Spatial Domain Identification of Spatial Transcriptomics
Abstract
Recent studies have shown that clustering latent variables, derived from autoencoders (AEs) through reconstruction tasks, is effective for identifying spatial domains of spatial transcriptomics (ST). Despite their utility, AEs exhibit inherent limitations due to discontinuities in latent spaces where similar inputs might not closely map. Variational Autoencoders address this by enhancing smoothness and continuity through alignment with priors, yet their interpolation can obscure necessary group distinctions. To overcome these limitations, we introduce stBERT, a BERT-based pre-training framework that transforms the traditional reconstruction task in ST into a masked language modeling task. stBERT leverages BERT’s hidden states to achieve refined and rich latent space mappings, preserves essential group distinctions in complex and high-dimensional biological data, to facilitate more effective clustering and interpretation. Our experiments demonstrate that stBERT significantly outperforms current state-of-the-art models in tasks related to spatial domain identification, including clustering validation using ground truth labels, intrinsic evaluation of clustering performance, and biological validation of clustering outcomes. The source code of the stBERT model is available at https://github.com/azusakou/stBERT.
Keywords