Wellcome Open Research (Dec 2024)

Exome sequencing of UK birth cohorts [version 2; peer review: 2 approved, 1 approved with reservations]

  • Petr Danecek,
  • Amy Hough,
  • John Wright,
  • Karen Ho,
  • Nicholas J. Timpson,
  • Sarah J. Lindsay,
  • Davide Bonfanti,
  • Daniel S. Malawsky,
  • Rafaq Azad,
  • Deborah Plowman,
  • Sara Widaa,
  • Gemma Shireby,
  • Emla Fitzsimons,
  • David Bann,
  • Matthew E. Hurles,
  • Hilary C. Martin,
  • Susan M. Ring,
  • Dan Mason,
  • Michael A. Quail,
  • Wei Huang,
  • Vivek Iyer,
  • Mahmoud Koko,
  • Iaroslav Popov,
  • Laurie Fabian,
  • Gennadii Zakharov,
  • Ruth Y. Eberhardt,
  • Emma E. Wade,
  • Qin Qin Huang

Journal volume & issue
Vol. 9

Abstract

Read online

Birth cohort studies involve repeated surveys of large numbers of individuals from birth and throughout their lives. They collect information useful for a wide range of life course research domains, and biological samples which can be used to derive data from an increasing collection of omic technologies. This rich source of longitudinal data, when combined with genomic data, offers the scientific community valuable insights ranging from population genetics to applications across the social sciences. Here we present quality-controlled whole exome sequencing data from three UK birth cohorts: the Avon Longitudinal Study of Parents and Children (8,436 children and 3,215 parents), the Millenium Cohort Study (7,667 children and 6,925 parents) and Born in Bradford (8,784 children and 2,875 parents). The overall objective of this coordinated effort is to make the resulting high-quality data widely accessible to the global research community in a timely manner. We describe how the datasets were generated and subjected to quality control at the sample, variant and genotype level. We then present some preliminary analyses to illustrate the quality of the datasets and probe potential sources of bias. We introduce measures of ultra-rare variant burden to the variables available for researchers working on these cohorts, and show that the exome-wide burden of deleterious protein-truncating variants, S het burden, is associated with educational attainment and cognitive test scores. The whole exome sequence data from these birth cohorts (CRAM & VCF files) are available through the European Genome-Phenome Archive, and here we provide guidance for their use.

Keywords