Genome Medicine (Feb 2023)

Refining epigenetic prediction of chronological and biological age

  • Elena Bernabeu,
  • Daniel L. McCartney,
  • Danni A. Gadd,
  • Robert F. Hillary,
  • Ake T. Lu,
  • Lee Murphy,
  • Nicola Wrobel,
  • Archie Campbell,
  • Sarah E. Harris,
  • David Liewald,
  • Caroline Hayward,
  • Cathie Sudlow,
  • Simon R. Cox,
  • Kathryn L. Evans,
  • Steve Horvath,
  • Andrew M. McIntosh,
  • Matthew R. Robinson,
  • Catalina A. Vallejos,
  • Riccardo E. Marioni

DOI
https://doi.org/10.1186/s13073-023-01161-y
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Background Epigenetic clocks can track both chronological age (cAge) and biological age (bAge). The latter is typically defined by physiological biomarkers and risk of adverse health outcomes, including all-cause mortality. As cohort sample sizes increase, estimates of cAge and bAge become more precise. Here, we aim to develop accurate epigenetic predictors of cAge and bAge, whilst improving our understanding of their epigenomic architecture. Methods First, we perform large-scale (N = 18,413) epigenome-wide association studies (EWAS) of chronological age and all-cause mortality. Next, to create a cAge predictor, we use methylation data from 24,674 participants from the Generation Scotland study, the Lothian Birth Cohorts (LBC) of 1921 and 1936, and 8 other cohorts with publicly available data. In addition, we train a predictor of time to all-cause mortality as a proxy for bAge using the Generation Scotland cohort (1214 observed deaths). For this purpose, we use epigenetic surrogates (EpiScores) for 109 plasma proteins and the 8 component parts of GrimAge, one of the current best epigenetic predictors of survival. We test this bAge predictor in four external cohorts (LBC1921, LBC1936, the Framingham Heart Study and the Women’s Health Initiative study). Results Through the inclusion of linear and non-linear age-CpG associations from the EWAS, feature pre-selection in advance of elastic net regression, and a leave-one-cohort-out (LOCO) cross-validation framework, we obtain cAge prediction with a median absolute error equal to 2.3 years. Our bAge predictor was found to slightly outperform GrimAge in terms of the strength of its association to survival (HRGrimAge = 1.47 [1.40, 1.54] with p = 1.08 × 10−52, and HRbAge = 1.52 [1.44, 1.59] with p = 2.20 × 10−60). Finally, we introduce MethylBrowsR, an online tool to visualise epigenome-wide CpG-age associations. Conclusions The integration of multiple large datasets, EpiScores, non-linear DNAm effects, and new approaches to feature selection has facilitated improvements to the blood-based epigenetic prediction of biological and chronological age.