Predicting Sites of Epitranscriptome Modifications Using Unsupervised Representation Learning Based on Generative Adversarial Networks

Sirajul Salekin; Milad Mostavi; Yu-Chiao Chiu; Yidong Chen; Yidong Chen; Jianqiu Zhang; Yufei Huang; Yufei Huang

doi:10.3389/fphy.2020.00196

Frontiers in Physics (Jun 2020)

Predicting Sites of Epitranscriptome Modifications Using Unsupervised Representation Learning Based on Generative Adversarial Networks

Sirajul Salekin,
Milad Mostavi,
Yu-Chiao Chiu,
Yidong Chen,
Yidong Chen,
Jianqiu Zhang,
Yufei Huang,
Yufei Huang

Affiliations

Sirajul Salekin: Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX, United States
Milad Mostavi: Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX, United States
Yu-Chiao Chiu: Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, United States
Yidong Chen: Greehey Children's Cancer Research Institute, University of Texas Health San Antonio, San Antonio, TX, United States
Yidong Chen: Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX, United States
Jianqiu Zhang: Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX, United States
Yufei Huang: Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX, United States
Yufei Huang: Department of Population Health Sciences, University of Texas Health San Antonio, San Antonio, TX, United States

DOI: https://doi.org/10.3389/fphy.2020.00196
Journal volume & issue: Vol. 8

Abstract

Read online

Epitranscriptome is an exciting area that studies different types of modifications in transcripts, and the prediction of such modification sites from the transcript sequence is of significant interest. However, the scarcity of positive sites for most modifications imposes critical challenges for training robust algorithms. To circumvent this problem, we propose MR-GAN, a generative adversarial network (GAN)-based model, which is trained in an unsupervised fashion on the entire pre-mRNA sequences to learn a low-dimensional embedding of transcriptomic sequences. MR-GAN was then applied to extract embeddings of the sequences in a training dataset we created for nine epitranscriptome modifications, namely, m6A, m1A, m1G, m2G, m5C, m5U, 2′-O-Me, pseudouridine (Ψ), and dihydrouridine (D), of which the positive samples are very limited. Prediction models were trained based on the embeddings extracted by MR-GAN. We compared the prediction performance with the one-hot encoding of the training sequences and SRAMP, a state-of-the-art m6A site prediction algorithm, and demonstrated that the learned embeddings outperform one-hot encoding by a significant margin for up to 15% improvement. Using MR-GAN, we also investigated the sequence motifs for each modification type and uncovered known motifs as well as new motifs not possible with sequences directly. The results demonstrated that transcriptome features extracted using unsupervised learning could lead to high precision for predicting multiple types of epitranscriptome modifications, even when the data size is small and extremely imbalanced.

Published in Frontiers in Physics

ISSN: 2296-424X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Physics
Website: https://www.frontiersin.org/journals/physics

About the journal

Abstract

Keywords