BMC Bioinformatics (Dec 2019)

Protein sequence information extraction and subcellular localization prediction with gapped k-Mer method

  • Yu-hua Yao,
  • Ya-ping Lv,
  • Ling Li,
  • Hui-min Xu,
  • Bin-bin Ji,
  • Jing Chen,
  • Chun Li,
  • Bo Liao,
  • Xu-ying Nan

DOI
https://doi.org/10.1186/s12859-019-3232-4
Journal volume & issue
Vol. 20, no. S22
pp. 1 – 8

Abstract

Read online

Abstract Background Subcellular localization prediction of protein is an important component of bioinformatics, which has great importance for drug design and other applications. A multitude of computational tools for proteins subcellular location have been developed in the recent decades, however, existing methods differ in the protein sequence representation techniques and classification algorithms adopted. Results In this paper, we firstly introduce two kinds of protein sequences encoding schemes: dipeptide information with space and Gapped k-mer information. Then, the Gapped k-mer calculation method which is based on quad-tree is also introduced. Conclusions >From the prediction results, this method not only reduces the dimension, but also improves the prediction precision of protein subcellular localization.

Keywords