A Linear Regression Predictor for Identifying N6-Methyladenosine Sites Using Frequent Gapped K-mer Pattern

Y.Y. Zhuang; H.J. Liu; X. Song; Y. Ju; H. Peng

doi:10.1016/j.omtn.2019.10.001

Molecular Therapy: Nucleic Acids (Dec 2019)

A Linear Regression Predictor for Identifying N6-Methyladenosine Sites Using Frequent Gapped K-mer Pattern

Y.Y. Zhuang,
H.J. Liu,
X. Song,
Y. Ju,
H. Peng

Affiliations

Y.Y. Zhuang: School of Informatics, Xiamen University, Xiamen 361005, China
H.J. Liu: College of Information Technology and Computer Science, University of the Cordilleras, Baguio 2600, Philippines
X. Song: School of Computer and Information Technology, Nanyang Normal University, Nanyang 473000, China; Corresponding author: Xiao Song, School of Computer and Information Technology, Nanyang Normal University, Nanyang 473000, China.
Y. Ju: School of Informatics, Xiamen University, Xiamen 361005, China
H. Peng: School of Informatics, Xiamen University, Xiamen 361005, China

DOI: https://doi.org/10.1016/j.omtn.2019.10.001
Journal volume & issue: Vol. 18
pp. 673 – 680

Abstract

Read online

N6-methyladenosine (m6A) is one of the most common and abundant modifications in RNA, which is related to many biological processes in humans. Abnormal RNA modifications are often associated with a series of diseases, including tumors, neurogenic diseases, and embryonic retardation. Therefore, identifying m6A sites is of paramount importance in the post-genomic age. Although many lab-based methods have been proposed to annotate m6A sites, they are time consuming and cost ineffective. In view of the drawbacks of the intrinsic methods in RNA sequence recognition, computational methods are suggested as a supplement to identify m6A sites. In this study, we develop a novel feature extraction algorithm based on the frequent gapped k-mer pattern (FGKP) and apply the linear regression to construct the prediction model. The new predictor is used to identify m6A sites in the Saccharomyces cerevisiae database. It has been shown by the 10-fold cross-validation that the performance is better than that of recent methods. Comparative results indicate that our model has great potential to become a useful and effective tool for genome analysis and gain more insights for locating m6A sites. Keywords: N6-methyladenosine, RNA modifications, novel feature extraction algorithm, frequent gapped k-mer pattern, linear regression, Saccharomyces cerevisiae database, 10-fold cross-validation, genome analysis

Published in Molecular Therapy: Nucleic Acids

ISSN: 2162-2531 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Therapeutics. Pharmacology
Website: https://www.cell.com/molecular-therapy-family/nucleic-acids/latest-content

About the journal