Frontiers in Genetics (Mar 2020)

Predicting ATP-Binding Cassette Transporters Using the Random Forest Method

  • Ruiyan Hou,
  • Ruiyan Hou,
  • Lida Wang,
  • Yi-Jun Wu

DOI
https://doi.org/10.3389/fgene.2020.00156
Journal volume & issue
Vol. 11

Abstract

Read online

ATP-binding cassette (ABC) proteins play important roles in a wide variety of species. These proteins are involved in absorbing nutrients, exporting toxic substances, and regulating potassium channels, and they contribute to drug resistance in cancer cells. Therefore, the identification of ABC transporters is an urgent task. The present study used 188D as the feature extraction method, which is based on sequence information and physicochemical properties. We also visualized the feature extracted by t-Distributed Stochastic Neighbor Embedding (t-SNE). The sample based on the features extracted by 188D may be separated. Further, random forest (RF) is an efficient classifier to identify proteins. Under the 10-fold cross-validation of the model proposed here for a training set, the average accuracy rate of 10 training sets was 89.54%. We obtained values of 0.87 for specificity, 0.92 for sensitivity, and 0.79 for MCC. In the testing set, the accuracy achieved was 89%. These results suggest that the model combining 188D with RF is an optimal tool to identify ABC transporters.

Keywords