IEEE Access (Jan 2019)
An Improved Process for Generating Uniform PSSMs and Its Application in Protein Subcellular Localization via Various Global Dimension Reduction Techniques
Abstract
This paper proposes an improved protein feature expression called segmented amino acid composition in position-specific scoring matrix (PSSM-SAA) in the field of subcellular localization prediction. Since there has been sufficient local information in the PSSM-SAA vector with high dimensionality, four global algorithms of dimensional reduction are suggested, including linear discriminant analysis (LDA), median LDA (MDA), generalized Fisher discriminant analysis (GDA), and median-mean line-based discriminant analysis (MMLDA). PSSM-SAA is also compared with three important expressions: PSSM-S, DipCPSSM, and PsePSSM. Numerical experiments involving the overall success rate (OSR) show that PSSM-SAA is much better than PSSM-S and DipCPSSM and slightly better than or equal in performance to PsePSSM regardless of which dimension reduction algorithm is used. LDA is finally recommended for PSSM-SAA through comparison among four techniques of dimensional reduction. Other popular evaluation indexes also confirm the effectiveness of PSSM-SAA with LDA. Next, the suggested model is compared with the state-of-the-art predictors to further evaluate its validity. Finally, a new user-friendly local software for implementing PSSM-SAA is provided, which can be found at https://www.github.com/caozicheng/PSSMSAA-Builder.
Keywords