BMC Medical Informatics and Decision Making (Oct 2021)

Assessing the suitability of general practice electronic health records for clinical prediction model development: a data quality assessment

  • Sharmala Thuraisingam,
  • Patty Chondros,
  • Michelle M. Dowsey,
  • Tim Spelman,
  • Stephanie Garies,
  • Peter F. Choong,
  • Jane Gunn,
  • Jo-Anne Manski-Nankervis

Journal volume & issue
Vol. 21, no. 1
pp. 1 – 11


Read online

Abstract Background The use of general practice electronic health records (EHRs) for research purposes is in its infancy in Australia. Given these data were collected for clinical purposes, questions remain around data quality and whether these data are suitable for use in prediction model development. In this study we assess the quality of data recorded in 201,462 patient EHRs from 483 Australian general practices to determine its usefulness in the development of a clinical prediction model for total knee replacement (TKR) surgery in patients with osteoarthritis (OA). Methods Variables to be used in model development were assessed for completeness and plausibility. Accuracy for the outcome and competing risk were assessed through record level linkage with two gold standard national registries, Australian Orthopaedic Association National Joint Replacement Registry (AOANJRR) and National Death Index (NDI). The validity of the EHR data was tested using participant characteristics from the 2014–15 Australian National Health Survey (NHS). Results There were substantial missing data for body mass index and weight gain between early adulthood and middle age. TKR and death were recorded with good accuracy, however, year of TKR, year of death and side of TKR were poorly recorded. Patient characteristics recorded in the EHR were comparable to participant characteristics from the NHS, except for OA medication and metastatic solid tumour. Conclusions In this study, data relating to the outcome, competing risk and two predictors were unfit for prediction model development. This study highlights the need for more accurate and complete recording of patient data within EHRs if these data are to be used to develop clinical prediction models. Data linkage with other gold standard data sets/registries may in the meantime help overcome some of the current data quality challenges in general practice EHRs when developing prediction models.