BMC Bioinformatics (Jun 2017)

Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme

  • Jian Zhang,
  • Haiting Chai,
  • Guifu Yang,
  • Zhiqiang Ma

DOI
https://doi.org/10.1186/s12859-017-1709-6
Journal volume & issue
Vol. 18, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background Bioluminescent proteins (BLPs) widely exist in many living organisms. As BLPs are featured by the capability of emitting lights, they can be served as biomarkers and easily detected in biomedical research, such as gene expression analysis and signal transduction pathways. Therefore, accurate identification of BLPs is important for disease diagnosis and biomedical engineering. In this paper, we propose a novel accurate sequence-based method named PredBLP (Prediction of BioLuminescent Proteins) to predict BLPs. Results We collect a series of sequence-derived features, which have been proved to be involved in the structure and function of BLPs. These features include amino acid composition, dipeptide composition, sequence motifs and physicochemical properties. We further prove that the combination of four types of features outperforms any other combinations or individual features. To remove potential irrelevant or redundant features, we also introduce Fisher Markov Selector together with Sequential Backward Selection strategy to select the optimal feature subsets. Additionally, we design a lineage-specific scheme, which is proved to be more effective than traditional universal approaches. Conclusion Experiment on benchmark datasets proves the robustness of PredBLP. We demonstrate that lineage-specific models significantly outperform universal ones. We also test the generalization capability of PredBLP based on independent testing datasets as well as newly deposited BLPs in UniProt. PredBLP is proved to be able to exceed many state-of-art methods. A web server named PredBLP, which implements the proposed method, is free available for academic use.

Keywords