IEEE Access (Jan 2018)
DeepSS: Exploring Splice Site Motif Through Convolutional Neural Network Directly From DNA Sequence
Abstract
Splice sites prediction and interpretation are crucial to the understanding of complicated mechanisms underlying gene transcriptional regulation. Although existing computational approaches can classify true/false splice sites, the performance mostly relies on a set of sequence- or structure-based features and model interpretability is relatively weak. In viewing of these challenges, we report a deep learning-based framework (DeepSS), which consists of DeepSS-C module to classify splice sites and DeepSS-M module to detect splice sites sequence pattern. Unlike previous feature construction and model training process, DeepSS-C module accomplishes feature learning during the whole model training. Compared with state-of-the-art algorithms, experimental results show that the DeepSS-C module yields more accurate performance on six publicly donor/acceptor splice sites data sets. In addition, the parameters of the trained DeepSS-M module are used for model interpretation and downstream analysis, including: 1) genome factors detection (the truly relevant motifs that induce the related biological process happen) via filters from deep learning perspective; 2) analyzing the ability of CNN filters on motifs detection; 3) co-analysis of filters and motifs on DNA sequence pattern. DeepSS is freely available at http://ailab.ahu.edu.cn:8087/DeepSS/index.html.
Keywords