Global sequence properties for superfamily prediction: a machine learning approach

Dobson Richard Jb.; Munroe Patricia B; Caulfield Mark J; Saqi Mansoor

doi:10.2390/biecoll-jib-2009-109

Journal of Integrative Bioinformatics (Mar 2009)

Global sequence properties for superfamily prediction: a machine learning approach

Dobson Richard Jb.,
Munroe Patricia B,
Caulfield Mark J,
Saqi Mansoor

Affiliations

Dobson Richard Jb.: The William Harvey Research Institute, Bart’s and the London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, United Kingdom of Great Britain and Northern Ireland
Munroe Patricia B: The William Harvey Research Institute, Bart’s and the London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, United Kingdom of Great Britain and Northern Ireland
Caulfield Mark J: The William Harvey Research Institute, Bart’s and the London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, United Kingdom of Great Britain and Northern Ireland
Saqi Mansoor: Institute of Cell and Molecular Science, Bart’s and the London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK United Kingdom of Great Britain and Northern Ireland

DOI: https://doi.org/10.2390/biecoll-jib-2009-109
Journal volume & issue: Vol. 6, no. 1
pp. 25 – 49

Abstract

Read online

Functional annotation of a protein sequence in the absence of experimental data or clear similarity to a sequence of known function is difficult. In this study, a simple set of sequence attributes based on physicochemical and predicted structural characteristics were used as input to machine learning methods. In order to improve performance through increasing the data available for training, a technique of sequence enrichment was explored. These methods were used to predict membership to 24 and 49 large and diverse protein superfamiles from the SCOP database.

Published in Journal of Integrative Bioinformatics

ISSN: 1613-4516 (Online)
Publisher: De Gruyter
Country of publisher: Germany
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.degruyter.com/view/j/jib

About the journal