PLoS ONE (Jan 2011)
Optimal codon identities in bacteria: implications from the conflicting results of two different methods.
Abstract
A correlation method was recently adopted to identify selection-favored 'optimal' codons from 675 bacterial genomes. Surprisingly, the identities of these optimal codons were found to track the bacterial GC content, leading to a conclusion that selection would generally shape the codon usages to the same direction as the overall mutation does. Raising several concerns, here we report a thorough comparative study on 203 well-selected bacterial species, which strongly suggest that the previous conclusion is likely an illusion. Firstly, the previous study did not preclude species that are suffering weak or no selection pressures on their codon usages. For these species, as showed in this study, the optimal codon identities are prone to be incorrect and follow GC content. Secondly, the previous study only adopted the correlation method, without considering another method to test the reliability of inferred optimal codons. Actually by definition, optimal codons can also be identified by simply comparing codon usages between high- and low-expression genes. After using both methods to identify optimal codons for the selected species, we obtained highly conflicting results, suggesting at least one method is misleading. Further we found a critical problem of correlation method at the step of calculating gene bias level. Due to a failure of accurately defining the background mutation, the problem would result in wrong optimal codon identities. In other words, partial mutational effects on codon choices were mistakenly regarded as selective influences, leading to incorrect and biased optimal codon identities. Finally, considering the translational dynamics, optimal codons identified by comparison method can be well-explained by tRNA compositions, whereas optimal codons identified by correlation method can not be. For all above reasons, we conclude that real optimal codons actually do not track the genomic GC content, and correlation method is misleading in identifying optimal codons and better be avoided.