Computational and Structural Biotechnology Journal (Jan 2023)

CNNSplice: Robust models for splice site prediction using convolutional neural networks

  • Victor Akpokiro,
  • H. M. A. Mohit Chowdhury,
  • Samuel Olowofila,
  • Raisa Nusrat,
  • Oluwatosin Oluwadare

Journal volume & issue
Vol. 21
pp. 3210 – 3223

Abstract

Read online

The identification of splice site, or segments of an RNA gene where noncoding and coding sequences are connected in the 5′ and 3′ directions, is an essential post-transcriptional step for the annotation of functional genes and is required for the study and analysis of biological function in eukaryotic organisms through protein production and gene expression. Splice site detection tools have been proposed for this purpose; however, the models of these tools have a specific use case and are inefficiently or typically untransferable between organisms. Here, we present CNNSplice, a set of deep convolutional neural network models for splice site prediction. Using the five-fold cross-validation model selection technique, we explore several models based on typical machine learning applications and propose five high-performing models to efficiently predict the true and false SS in balanced and imbalanced datasets. Our evaluation results indicate that CNNSplice’s models achieve a better performance compared with existing methods across five organisms’ datasets. In addition, our generality test shows CNNSplice’s model ability to predict and annotate splice sites in new or poorly trained genome datasets indicating a broad application spectrum. CNNSplice demonstrates improved model prediction, interpretability, and generalizability on genomic datasets compared to existing splice site prediction tools. We have developed a web server for the CNNSplice algorithm which can be publicly accessed here: http://www.cnnsplice.online

Keywords