International Journal of Population Data Science (Sep 2024)

Data resource profile: the ORIGINS project databank: a collaborative data resource for investigating the developmental origins of health and disease

  • Belinda Davey,
  • Wesley Billingham,
  • Jacqueline Davis,
  • Lisa Gibson,
  • Nina D'Vaz,
  • Susan Prescott,
  • Desiree Silva,
  • Sarah Whalan

DOI
https://doi.org/10.23889/ijpds.v8i6.2388
Journal volume & issue
Vol. 8, no. 6

Abstract

Read online

Introduction The ORIGINS Project ("ORIGINS") is a longitudinal, population-level birth cohort with data and biosample collections that aim to facilitate research to reduce non-communicable diseases (NCDs) and encourage 'a healthy start to life'. ORIGINS has gathered millions of datapoints and over 400,000 biosamples over 15 timepoints, antenatally through to five years of age, from mothers, non-birthing partners and the child, across four health and wellness domains: 'Growth and development', 'Medical, biological and genetic', 'Biopsychosocial and cognitive', 'Lifestyle, environment and nutrition'. Methods Mothers, non-birthing partners and their offspring were recruited antenatally (between 18 and 38 weeks' gestation) from the Joondalup and Wanneroo communities of Perth, Western Australia from 2017 to 2024. Data come from several sources, including routine hospital antenatal and birthing data, ORIGINS clinical appointments, and online self-completed surveys comprising several standardised measures. Data are merged using the Medical Record Number (MRN), the ORIGINS Unique Identifier and the ORIGINS Pregnancy Number, as well as additional demographic data (e.g. name and date of birth) when necessary. Results The data are held on an integrated data platform that extracts, links, ingests, integrates and stores ORIGINS' data on an Amazon Web Services (AWS) cloud-based data warehouse. Data are linked, transformed for cleaning and coding, and catalogued, ready to provide to sub-projects (independent researchers that apply to use ORIGINS data) to prepare for their own analyses. ORIGINS maximises data quality by checking and replacing missing and erroneous data across the various data sources. Conclusion As a wide array of data across several different domains and timepoints has been collected, the options for future research and utilisation of the data and biosamples are broad. As ORIGINS aims to extend into middle childhood, researchers can examine which antenatal and early childhood factors predict middle childhood outcomes. ORIGINS also aims to link to State and Commonwealth data sets (e.g. Medicare, the National Assessment Program -- Literacy and Numeracy, the Pharmaceutical Benefits Scheme) which will cater to a wide array of research questions.

Keywords