Revealing Epigenetic Factors of circRNA Expression by Machine Learning in Various Cellular Contexts
Mengying Zhang,
Kang Xu,
Limei Fu,
Qi Wang,
Zhenghong Chang,
Haozhe Zou,
Yan Zhang,
Yongsheng Li
Affiliations
Mengying Zhang
College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
Kang Xu
College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
Limei Fu
College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China; Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou 571199, China
Qi Wang
College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
Zhenghong Chang
College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
Haozhe Zou
College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China; Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou 571199, China
Yan Zhang
School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China; Corresponding author
Yongsheng Li
College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China; Key Laboratory of Tropical Translational Medicine of Ministry of Education, Hainan Medical University, Haikou 571199, China; Corresponding author
Summary: Circular RNAs (circRNAs) have been identified as naturally occurring RNAs that are highly represented in the eukaryotic transcriptome. Although a large number of circRNAs have been reported, the underlying regulatory mechanism of circRNAs biogenesis remains largely unknown. Here, we integrated in-depth multi-omics data including epigenome, transcriptome, and non-coding RNA and identified candidate circRNAs in six cellular contexts. Next, circRNAs were divided into two classes (high versus low) with different expression levels. Machine learning models were constructed that predicted circRNA expression levels based on 11 different histone modifications and host gene expression. We found that the models achieve great accuracy in predicting high versus low expressed circRNAs. Furthermore, the expression levels of host genes of circRNAs, H3k36me3, H3k79me2, and H4k20me1 contributed greatly to the classification models in six cellular contexts. In summary, all these results suggest that epigenetic modifications, particularly histone modifications, can effectively predict expression levels of circRNAs.