iScience (Dec 2020)
Revealing Epigenetic Factors of circRNA Expression by Machine Learning in Various Cellular Contexts
Abstract
Summary: Circular RNAs (circRNAs) have been identified as naturally occurring RNAs that are highly represented in the eukaryotic transcriptome. Although a large number of circRNAs have been reported, the underlying regulatory mechanism of circRNAs biogenesis remains largely unknown. Here, we integrated in-depth multi-omics data including epigenome, transcriptome, and non-coding RNA and identified candidate circRNAs in six cellular contexts. Next, circRNAs were divided into two classes (high versus low) with different expression levels. Machine learning models were constructed that predicted circRNA expression levels based on 11 different histone modifications and host gene expression. We found that the models achieve great accuracy in predicting high versus low expressed circRNAs. Furthermore, the expression levels of host genes of circRNAs, H3k36me3, H3k79me2, and H4k20me1 contributed greatly to the classification models in six cellular contexts. In summary, all these results suggest that epigenetic modifications, particularly histone modifications, can effectively predict expression levels of circRNAs.