Genome-wide identification of coding and non-coding conserved sequence tags in human and mouse genomes

Maggi Giorgio P; Donvito Giacinto; Anselmo Anna; Mignone Flavio; Grillo Giorgio; Pesole Graziano

doi:10.1186/1471-2164-9-277

BMC Genomics (Jun 2008)

Genome-wide identification of coding and non-coding conserved sequence tags in human and mouse genomes

Maggi Giorgio P,
Donvito Giacinto,
Anselmo Anna,
Mignone Flavio,
Grillo Giorgio,
Pesole Graziano

Affiliations

Maggi Giorgio P
Donvito Giacinto
Anselmo Anna
Mignone Flavio
Grillo Giorgio
Pesole Graziano

DOI: https://doi.org/10.1186/1471-2164-9-277
Journal volume & issue: Vol. 9, no. 1
p. 277

Abstract

Read online

Abstract Background The accurate detection of genes and the identification of functional regions is still an open issue in the annotation of genomic sequences. This problem affects new genomes but also those of very well studied organisms such as human and mouse where, despite the great efforts, the inventory of genes and regulatory regions is far from complete. Comparative genomics is an effective approach to address this problem. Unfortunately it is limited by the computational requirements needed to perform genome-wide comparisons and by the problem of discriminating between conserved coding and non-coding sequences. This discrimination is often based (thus dependent) on the availability of annotated proteins. Results In this paper we present the results of a comprehensive comparison of human and mouse genomes performed with a new high throughput grid-based system which allows the rapid detection of conserved sequences and accurate assessment of their coding potential. By detecting clusters of coding conserved sequences the system is also suitable to accurately identify potential gene loci. Following this analysis we created a collection of human-mouse conserved sequence tags and carefully compared our results to reliable annotations in order to benchmark the reliability of our classifications. Strikingly we were able to detect several potential gene loci supported by EST sequences but not corresponding to as yet annotated genes. Conclusion Here we present a new system which allows comprehensive comparison of genomes to detect conserved coding and non-coding sequences and the identification of potential gene loci. Our system does not require the availability of any annotated sequence thus is suitable for the analysis of new or poorly annotated genomes.

Published in BMC Genomics

ISSN: 1471-2164 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Chemical technology: Biotechnology; Science: Biology (General): Genetics
Website: http://bmcgenomics.biomedcentral.com

About the journal