Royal Society Open Science (Mar 2019)

How real are observed trends in small correlated datasets?

  • S. J. Salamon,
  • H. J. Hansen,
  • D. Abbott

DOI
https://doi.org/10.1098/rsos.181089
Journal volume & issue
Vol. 6, no. 3

Abstract

Read online

The eye may perceive a significant trend in plotted time-series data, but if the model errors of nearby data points are correlated, the trend may be an illusion. We examine generalized least-squares (GLS) estimation, finding that error correlation may be underestimated in highly correlated small datasets by conventional techniques. This risks indicating a significant trend when there is none. A new correlation estimate based on the Durbin–Watson statistic is developed, leading to an improved estimate of autoregression with highly correlated data, thus reducing this risk. These techniques are generalized to randomly located data points in space, through the new concept of the nearest new neighbour path. We describe tests on the validity of the GLS schemes, allowing verification of the models employed. Examples illustrating our method include a 40-year record of atmospheric carbon dioxide, and Antarctic ice core data. While more conservative than existing techniques, our new GLS estimate finds a statistically significant increase in background carbon dioxide concentration, with an accelerating trend. We conclude with an example of a worldwide empirical climate model for radio propagation studies, to illustrate dealing with spatial correlation in unevenly distributed data points over the surface of the Earth. The method is generally applicable, not only to climate-related data, but to many other kinds of problems (e.g. biological, medical and geological data), where there are unequally (or randomly) spaced observations in temporally or spatially distributed datasets.

Keywords