IEEE Access (Jan 2020)
Discrimination of Golgi Proteins Through Efficient Exploitation of Hybrid Feature Spaces Coupled With SMOTE and Ensemble of Support Vector Machine
Abstract
Many organelles inside and outside a living cell depend on the perfect behavior of Golgi apparatus for smooth and normal functioning. Its poor performance may lead to many inheritable diseases like diabetes and cancer. Therefore, it is highly crucial to detect any strange behavior of Golgi apparatus in advance. Accurate discrimination of cis-Golgi from trans-Golgi proteins surely helps researchers identify the role of Golgi proteins in various diseases and assist pharmacists in drug development. In this work, various hybrid models of Bi-Profile Bayes, Bigram PSSM, Di-Peptide Composition, and Split Amphiphilic Pseudo Amino Acid Composition with SMOTE oversampling technique have been employed to discriminate Golgi protein types. Multiple linear Support Vector Machines have been used to exploit the discrimination power of these models. The proposed prediction system: Golgi-predictor has shown significant performance and achieved promising results compared to other existing state-of-the-art techniques. Through the 10-fold cross-validation, the proposed system achieved an accuracy value of 97.6%, sensitivity value of 98.8%, specificity value of 96.5%, G-mean value of 97.6%, MCC value of 0.95, and F-score value of 0.97. Similarly, through the jackknife cross-validation, the achieved values for accuracy, sensitivity, specificity, G-mean, MCC, and F-score are respectively, 96.5%, 97.8%, 95.2%, 96.4%, 0.93, and 0.96. Moreover, through the independent dataset testing, Golgi-predictor demonstrated significant enhancement in performance over other techniques. The proposed methodology aims at supporting drug designers in pharmaceutical industry and assisting researchers from the fields of bioinformatics and computational biology towards better innovation in predicting the behavior of Golgi proteins.
Keywords