Computational and Structural Biotechnology Journal (Jan 2019)

AtbPpred: A Robust Sequence-Based Prediction of Anti-Tubercular Peptides Using Extremely Randomized Trees

  • Balachandran Manavalan,
  • Shaherin Basith,
  • Tae Hwan Shin,
  • Leyi Wei,
  • Gwang Lee

Journal volume & issue
Vol. 17
pp. 972 – 981

Abstract

Read online

Mycobacterium tuberculosis is one of the most dangerous pathogens in humans. It acts as an etiological agent of tuberculosis (TB), infecting almost one-third of the world's population. Owing to the high incidence of multidrug-resistant TB and extensively drug-resistant TB, there is an urgent need for novel and effective alternative therapies. Peptide-based therapy has several advantages, such as diverse mechanisms of action, low immunogenicity, and selective affinity to bacterial cell envelopes. However, the identification of anti-tubercular peptides (AtbPs) via experimentation is laborious and expensive; hence, the development of an efficient computational method is necessary for the prediction of AtbPs prior to both in vitro and in vivo experiments. To this end, we developed a two-layer machine learning (ML)-based predictor called AtbPpred for the identification of AtbPs. In the first layer, we applied a two-step feature selection procedure and identified the optimal feature set individually for nine different feature encodings, whose corresponding models were developed using extremely randomized tree (ERT). In the second-layer, the predicted probability of AtbPs from the above nine models were considered as input features to ERT and developed the final predictor. AtbPpred respectively achieved average accuracies of 88.3% and 87.3% during cross-validation and an independent evaluation, which were ~8.7% and 10.0% higher than the state-of-the-art method. Furthermore, we established a user-friendly webserver which is currently available at http://thegleelab.org/AtbPpred. We anticipate that this predictor could be useful in the high-throughput prediction of AtbPs and also provide mechanistic insights into its functions. Keywords: Extremely randomized tree, Two-step feature selection, Antitubercular peptide, Tuberculosis, Cross-validation, Two-layer framework