BMC Bioinformatics (Jun 2007)

Mining prokaryotic genomes for unknown amino acids: a stop-codon-based approach

  • Esaki Nobuyoshi,
  • Goto Susumu,
  • Mihara Hisaaki,
  • Fujita Masashi,
  • Kanehisa Minoru

DOI
https://doi.org/10.1186/1471-2105-8-225
Journal volume & issue
Vol. 8, no. 1
p. 225

Abstract

Read online

Abstract Background Selenocysteine and pyrrolysine are the 21st and 22nd amino acids, which are genetically encoded by stop codons. Since a number of microbial genomes have been completely sequenced to date, it is tempting to ask whether the 23rd amino acid is left undiscovered in these genomes. Recently, a computational study addressed this question and reported that no tRNA gene for unknown amino acid was found in genome sequences available. However, performance of the tRNA prediction program on an unknown tRNA family, which may have atypical sequence and structure, is unclear, thereby rendering their result inconclusive. A protein-level study will provide independent insight into the novel amino acid. Results Assuming that the 23rd amino acid is also encoded by a stop codon, we systematically predicted proteins that contain stop-codon-encoded amino acids from 191 prokaryotic genomes. Since our prediction method relies only on the conservation patterns of primary sequences, it also provides an opportunity to search novel selenoproteins and other readthrough proteins. It successfully recovered many of currently known selenoproteins and pyrrolysine proteins. However, no promising candidate for the 23rd amino acid was detected, and only one novel selenoprotein was predicted. Conclusion Our result suggests that the unknown amino acid encoded by stop codons does not exist, or its phylogenetic distribution is rather limited, which is in agreement with the previous study on tRNA. The method described here can be used in future studies to explore novel readthrough events from complete genomes, which are rapidly growing.