IEEE Access (Jan 2019)

An Improved Process for Generating Uniform PSSMs and Its Application in Protein Subcellular Localization via Various Global Dimension Reduction Techniques

  • Shunfang Wang,
  • Wenjia Li,
  • Yu Fei,
  • Zicheng Cao,
  • Dongshu Xu,
  • Huanyu Guo

DOI
https://doi.org/10.1109/ACCESS.2019.2907642
Journal volume & issue
Vol. 7
pp. 42384 – 42395

Abstract

Read online

This paper proposes an improved protein feature expression called segmented amino acid composition in position-specific scoring matrix (PSSM-SAA) in the field of subcellular localization prediction. Since there has been sufficient local information in the PSSM-SAA vector with high dimensionality, four global algorithms of dimensional reduction are suggested, including linear discriminant analysis (LDA), median LDA (MDA), generalized Fisher discriminant analysis (GDA), and median-mean line-based discriminant analysis (MMLDA). PSSM-SAA is also compared with three important expressions: PSSM-S, DipCPSSM, and PsePSSM. Numerical experiments involving the overall success rate (OSR) show that PSSM-SAA is much better than PSSM-S and DipCPSSM and slightly better than or equal in performance to PsePSSM regardless of which dimension reduction algorithm is used. LDA is finally recommended for PSSM-SAA through comparison among four techniques of dimensional reduction. Other popular evaluation indexes also confirm the effectiveness of PSSM-SAA with LDA. Next, the suggested model is compared with the state-of-the-art predictors to further evaluate its validity. Finally, a new user-friendly local software for implementing PSSM-SAA is provided, which can be found at https://www.github.com/caozicheng/PSSMSAA-Builder.

Keywords