SGCL-LncLoc: An Interpretable Deep Learning Model for Improving lncRNA Subcellular Localization Prediction with Supervised Graph Contrastive Learning

Min Li; Baoying Zhao; Yiming Li; Pingjian Ding; Rui Yin; Shichao Kan; Min Zeng

doi:10.26599/bdma.2024.9020002

Big Data Mining and Analytics (Sep 2024)

SGCL-LncLoc: An Interpretable Deep Learning Model for Improving lncRNA Subcellular Localization Prediction with Supervised Graph Contrastive Learning

Min Li,
Baoying Zhao,
Yiming Li,
Pingjian Ding,
Rui Yin,
Shichao Kan,
Min Zeng

Affiliations

Min Li: School of Computer Science and Engineering, Central South University, Changsha 410083, China
Baoying Zhao: School of Computer Science and Engineering, Central South University, Changsha 410083, China
Yiming Li: School of Computer Science and Engineering, Central South University, Changsha 410083, China
Pingjian Ding: Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University, Cleveland, OH 44106, USA
Rui Yin: Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32603, USA
Shichao Kan: School of Computer Science and Engineering, Central South University, Changsha 410083, China
Min Zeng: School of Computer Science and Engineering, Central South University, Changsha 410083, China

DOI: https://doi.org/10.26599/bdma.2024.9020002
Journal volume & issue: Vol. 7, no. 3
pp. 765 – 780

Abstract

Read online

Understanding the subcellular localization of long non-coding RNAs (lncRNAs) is crucial for unraveling their functional mechanisms. While previous computational methods have made progress in predicting lncRNA subcellular localization, most of them ignore the sequence order information by relying on k-mer frequency features to encode lncRNA sequences. In the study, we develope SGCL-LncLoc, a novel interpretable deep learning model based on supervised graph contrastive learning. SGCL-LncLoc transforms lncRNA sequences into de Bruijn graphs and uses the Word2Vec technique to learn the node representation of the graph. Then, SGCL-LncLoc applies graph convolutional networks to learn the comprehensive graph representation. Additionally, we propose a computational method to map the attention weights of the graph nodes to the weights of nucleotides in the lncRNA sequence, allowing SGCL-LncLoc to serve as an interpretable deep learning model. Furthermore, SGCL-LncLoc employs a supervised contrastive learning strategy, which leverages the relationships between different samples and label information, guiding the model to enhance representation learning for lncRNAs. Extensive experimental results demonstrate that SGCL-LncLoc outperforms both deep learning baseline models and existing predictors, showing its capability for accurate lncRNA subcellular localization prediction. Furthermore, we conduct a motif analysis, revealing that SGCL-LncLoc successfully captures known motifs associated with lncRNA subcellular localization. The SGCL-LncLoc web server is available at http://csuligroup.com:8000/SGCL-LncLoc. The source code can be obtained from https://github.com/CSUBioGroup/SGCL-LncLoc.

Published in Big Data Mining and Analytics

ISSN: 2096-0654 (Print); 2097-406X (Online)
Publisher: Tsinghua University Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8254253

About the journal

Abstract

Keywords