BMC Bioinformatics (Dec 2019)
Protein sequence information extraction and subcellular localization prediction with gapped k-Mer method
Abstract
Abstract Background Subcellular localization prediction of protein is an important component of bioinformatics, which has great importance for drug design and other applications. A multitude of computational tools for proteins subcellular location have been developed in the recent decades, however, existing methods differ in the protein sequence representation techniques and classification algorithms adopted. Results In this paper, we firstly introduce two kinds of protein sequences encoding schemes: dipeptide information with space and Gapped k-mer information. Then, the Gapped k-mer calculation method which is based on quad-tree is also introduced. Conclusions >From the prediction results, this method not only reduces the dimension, but also improves the prediction precision of protein subcellular localization.
Keywords