RBPSpot: Learning on appropriate contextual information for RBP binding sites discovery
Nitesh Kumar Sharma,
Sagar Gupta,
Ashwani Kumar,
Prakash Kumar,
Upendra Kumar Pradhan,
Ravi Shankar
Affiliations
Nitesh Kumar Sharma
Studio of Computational Biology & Bioinformatics (Biotech Division), The Himalayan Centre for High-throughput Computational Biology (HiCHiCoB, A BIC Supported by DBT, India), CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, HP 176061, India; Academy of Scientific and Innovative Research(AcSIR), Ghaziabad, Uttar Pradesh 201 002, India
Sagar Gupta
Studio of Computational Biology & Bioinformatics (Biotech Division), The Himalayan Centre for High-throughput Computational Biology (HiCHiCoB, A BIC Supported by DBT, India), CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, HP 176061, India
Ashwani Kumar
Studio of Computational Biology & Bioinformatics (Biotech Division), The Himalayan Centre for High-throughput Computational Biology (HiCHiCoB, A BIC Supported by DBT, India), CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, HP 176061, India
Prakash Kumar
Studio of Computational Biology & Bioinformatics (Biotech Division), The Himalayan Centre for High-throughput Computational Biology (HiCHiCoB, A BIC Supported by DBT, India), CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, HP 176061, India; Academy of Scientific and Innovative Research(AcSIR), Ghaziabad, Uttar Pradesh 201 002, India; ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, Pusa, New Delhi, Delhi 110012, India
Upendra Kumar Pradhan
Studio of Computational Biology & Bioinformatics (Biotech Division), The Himalayan Centre for High-throughput Computational Biology (HiCHiCoB, A BIC Supported by DBT, India), CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, HP 176061, India; Academy of Scientific and Innovative Research(AcSIR), Ghaziabad, Uttar Pradesh 201 002, India; ICAR-Indian Agricultural Statistics Research Institute, Library Avenue, Pusa, New Delhi, Delhi 110012, India
Ravi Shankar
Studio of Computational Biology & Bioinformatics (Biotech Division), The Himalayan Centre for High-throughput Computational Biology (HiCHiCoB, A BIC Supported by DBT, India), CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, HP 176061, India; Academy of Scientific and Innovative Research(AcSIR), Ghaziabad, Uttar Pradesh 201 002, India; Corresponding author
Summary: Identifying the factors determining the RBP-RNA interactions remains a big challenge. It involves sparse binding motifs and a suitable sequence context for binding. The present work describes an approach to detect RBP binding sites in RNAs using an ultra-fast inexact k-mers search for statistically significant seeds. The seeds work as an anchor to evaluate the context and binding potential using flanking region information while leveraging from Deep Feed-forward Neural Network. The developed models also received support from MD-simulation studies. The implemented software, RBPSpot, scored consistently high for all the performance metrics including average accuracy of ∼90% across a large number of validated datasets. It outperformed the compared tools, including some with much complex deep-learning models, during a comprehensive benchmarking process. RBPSpot can identify RBP binding sites in the human system and can also be used to develop new models, making it a valuable resource in the area of regulatory system studies.