International Journal of Information Management Data Insights (Nov 2021)

A multi class random forest (MCRF) model for classification of small plant peptides

  • Ankita Tripathi,
  • Tapas Goswami,
  • Shrawan Kumar Trivedi,
  • Ravi Datta Sharma

Journal volume & issue
Vol. 1, no. 2
p. 100029

Abstract

Read online

Research on the classification of the different categories of small peptides is becoming a challenge for bioinformatics domain. However, machine learning models have shown their potential to tackle such applications. We propose a multi-class random forest (MCRF) classifier to classify small peptides which is compared with state-of-art classifiers including, support vector machine with RBF kernel (SVM+RBF), naïve Bayes (NB), Decision Tree (C5.0), Random Forest (RF). Small peptides sequences are selected from ARA-PEPs repository (Hazarika, et al., 2017) where 13748 small peptides are listed with six categories (i.e., secreted, sORF, stress-induced peptides (SIP), secreted-sORF, sORF-SIP, SIP-secreted). Total 27 features are fetched for each small peptides sequence to prepare data. Comparison is done using metrics i.e., F-Value, Sensitivity, Specificity, ROC, and FP rate with some statistical validation i.e., Kappa Statistics and Wilcoxon sign ranked test. Results of this study show that the proposed classifier has potential to accurately classify multi-level imbalanced data.

Keywords