PLoS ONE (Mar 2010)

Variation in DNA substitution rates among lineages erroneously inferred from simulated clock-like data.

  • Rachel S Schwartz,
  • Rachel Lockridge Mueller

DOI
https://doi.org/10.1371/journal.pone.0009649
Journal volume & issue
Vol. 5, no. 3
p. e9649

Abstract

Read online

BackgroundThe observation of variation in substitution rates among lineages has led to (1) a general rejection of the molecular clock model, and (2) the suggestion that a number of biological characteristics of organisms can cause rate variation. Accurate estimates of rate variation, and thus accurate inferences regarding the causes of rate variation, depend on accurate estimates of substitution rates. However, theory suggests that even when the substitution process is clock-like, variable numbers of substitutions can occur among lineages because the substitution process is stochastic. Furthermore, substitution rates along lineages can be misestimated, particularly when multiple substitutions occur at some sites. Although these potential causes of error in rate estimation are well understood in theory, such error has not been examined in detail; consequently, empirical studies that estimate rate variation among lineages have been unable to determine whether their results could be impacted by estimation error.Methodology/principal findingsTo evaluate the extent to which error in rate estimation could erroneously suggest rate variation among lineages, we examined rate variation estimated for datasets simulated under a molecular clock on trees with equal and variable branch lengths. Thus, any apparent rate variation in these datasets reflects error in rate estimation rather than true differences in the underlying substitution process. We observed substantial rate variation among lineages in our simulations; however, we did not observe rate variation when average substitution rates were compared between different clades.Conclusions/significanceOur results confirm previous theoretical work suggesting that observations of among lineage rate variation in empirical data may be due to the stochastic substitution process and error in the estimation of substitution rates, rather than true differences in the underlying substitution process among lineages. However, conclusions regarding rate variation drawn from rates averaged across multiple branches are likely due to real, systematic variation in rates between groups.