Identifying potential biases in code sequences in primary care electronic healthcare records: a retrospective cohort study of the determinants of code frequency

Azeem Majeed; Thomas Woodcock; Jonathan Clarke; Paul Aylin; Thomas Beaney; David Salman; Mauricio Barahona

doi:10.1136/bmjopen-2023-072884

BMJ Open (Sep 2023)

Identifying potential biases in code sequences in primary care electronic healthcare records: a retrospective cohort study of the determinants of code frequency

Azeem Majeed,
Thomas Woodcock,
Jonathan Clarke,
Paul Aylin,
Thomas Beaney,
David Salman,
Mauricio Barahona

Affiliations

Azeem Majeed: Department of Primary Care & Public Health, Imperial College London, London W6 8RP, UK.
Thomas Woodcock: 2 National Institute for Health Research Applied Research Collaboration Northwest London, Imperial College London, London, UK
Jonathan Clarke: Patient Safety Translational Research Centre, Institute of Global Health Innovation, Imperial College London, London, UK
Paul Aylin: Department of Primary Care and Public Health, Imperial College London, London, UK
Thomas Beaney: Department of Primary Care and Public Health, Imperial College London, London, UK
David Salman: Emergency Medicine—Sport and Exercise Medicine, Imperial College Healthcare NHS Trust, London, UK
Mauricio Barahona: Centre for Mathematics of Precision Healthcare, Imperial College London, London, UK

DOI: https://doi.org/10.1136/bmjopen-2023-072884
Journal volume & issue: Vol. 13, no. 9

Abstract

Read online

Objectives To determine whether the frequency of diagnostic codes for long-term conditions (LTCs) in primary care electronic healthcare records (EHRs) is associated with (1) disease coding incentives, (2) General Practice (GP), (3) patient sociodemographic characteristics and (4) calendar year of diagnosis.Design Retrospective cohort study.Setting GPs in England from 2015 to 2022 contributing to the Clinical Practice Research Datalink Aurum dataset.Participants All patients registered to a GP with at least one incident LTC diagnosed between 1 January 2015 and 31 December 2019.Primary and secondary outcome measures The number of diagnostic codes for an LTC in (1) the first and (2) the second year following diagnosis, stratified by inclusion in the Quality and Outcomes Framework (QOF) financial incentive programme.Results 3 113 724 patients were included, with 7 723 365 incident LTCs. Conditions included in QOF had higher rates of annual coding than conditions not included in QOF (1.03 vs 0.32 per year, p<0.0001). There was significant variation in code frequency by GP which was not explained by patient sociodemographics. We found significant associations with patient sociodemographics, with a trend towards higher coding rates in people living in areas of higher deprivation for both QOF and non-QOF conditions. Code frequency was lower for conditions with follow-up time in 2020, associated with the onset of the COVID-19 pandemic.Conclusions The frequency of diagnostic codes for newly diagnosed LTCs is influenced by factors including patient sociodemographics, disease inclusion in QOF, GP practice and the impact of the COVID-19 pandemic. Natural language processing or other methods using temporally ordered code sequences should account for these factors to minimise potential bias.

Published in BMJ Open

ISSN: 2044-6055 (Online)
Publisher: BMJ Publishing Group
Country of publisher: United Kingdom
LCC subjects: Medicine
Website: https://bmjopen.bmj.com

About the journal