IEEE Access (Jan 2020)

Identification of Type VI Effector Proteins Using a Novel Ensemble Classifier

  • Chunyu Wang,
  • Jialin Li,
  • Ying Zhang,
  • Maozu Guo

DOI
https://doi.org/10.1109/ACCESS.2020.2985111
Journal volume & issue
Vol. 8
pp. 75085 – 75093

Abstract

Read online

The type VI secretion system (T6SS) delivers effector proteins (Type VI secretion system effectors, termed T6SEs) into neighboring target cells. Many human pathogens express T6SEs, including Vibrio cholera, Burkholderia spp., and Pseudomonas aeruginosa. T6SEs play vital roles in the competitive survival and pathogenesis of bacterial populations. Several machine-learning methods are able to distinguish T6SEs from non-T6SEs. However, we believe there is room for further development. Therefore, herein we propose a more powerful ensemble predictor for identifying T6SEs. Initially, we construct a benchmark dataset from existing studies and databases. Then we use $k$ -separated-bigrams-PSSM (a type of feature encoding) to convert the protein sequences to mathematical vectors. A synthetic minority oversampling technique (SMOTE) is next employed to solve the training data imbalance problem. Finally, we employ a soft voting strategy to construct an integrated model combining six fine-tuned base classifiers. The model we propose performs excellently in terms of accuracy (ACC, 99.0%), Matthew's correlation coefficient (MCC, 97.8%), sensitivity (SN, 97.1%), and specificity (SP, 100%) in independent testing.

Keywords