Consistency, completeness and external validity of ethnicity recording in NHS primary care records: a cohort study in 25 million patients’ records at source using OpenSAFELY
The OpenSAFELY Collaborative,
Colm D. Andrews,
Rohini Mathur,
Jon Massey,
Robin Park,
Helen J. Curtis,
Lisa Hopcroft,
Amir Mehrkar,
Seb Bacon,
George Hickman,
Rebecca Smith,
David Evans,
Tom Ward,
Simon Davy,
Peter Inglesby,
Iain Dillingham,
Steven Maude,
Thomas O’Dwyer,
Ben F. C. Butler-Cole,
Lucy Bridges,
Chris Bates,
John Parry,
Frank Hester,
Sam Harper,
Jonathan Cockburn,
Ben Goldacre,
Brian MacKenna,
Laurie A. Tomlinson,
Alex J. Walker,
William J. Hulme
Affiliations
The OpenSAFELY Collaborative
Colm D. Andrews
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Rohini Mathur
London School of Hygiene and Tropical Medicine
Jon Massey
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Robin Park
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Helen J. Curtis
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Lisa Hopcroft
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Amir Mehrkar
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Seb Bacon
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
George Hickman
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Rebecca Smith
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
David Evans
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Tom Ward
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Simon Davy
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Peter Inglesby
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Iain Dillingham
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Steven Maude
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Thomas O’Dwyer
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Ben F. C. Butler-Cole
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Lucy Bridges
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Chris Bates
TPP, TPP House
John Parry
TPP, TPP House
Frank Hester
TPP, TPP House
Sam Harper
TPP, TPP House
Jonathan Cockburn
TPP, TPP House
Ben Goldacre
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Brian MacKenna
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Laurie A. Tomlinson
London School of Hygiene and Tropical Medicine
Alex J. Walker
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
William J. Hulme
Nuffield Department of Primary Care Health Sciences, Bennett Institute for Applied Data Science, Oxford University
Abstract Background Ethnicity is known to be an important correlate of health outcomes, particularly during the COVID-19 pandemic, where some ethnic groups were shown to be at higher risk of infection and adverse outcomes. The recording of patients’ ethnic groups in primary care can support research and efforts to achieve equity in service provision and outcomes; however, the coding of ethnicity is known to present complex challenges. We therefore set out to describe ethnicity coding in detail with a view to supporting the use of this data in a wide range of settings, as part of wider efforts to robustly describe and define methods of using administrative data. Methods We describe the completeness and consistency of primary care ethnicity recording in the OpenSAFELY-TPP database, containing linked primary care and hospital records in > 25 million patients in England. We also compared the ethnic breakdown in OpenSAFELY-TPP with that of the 2021 UK census. Results 78.2% of patients registered in OpenSAFELY-TPP on 1 January 2022 had their ethnicity recorded in primary care records, rising to 92.5% when supplemented with hospital data. The completeness of ethnicity recording was higher for women than for men. The rate of primary care ethnicity recording ranged from 77% in the South East of England to 82.2% in the West Midlands. Ethnicity recording rates were higher in patients with chronic or other serious health conditions. For each of the five broad ethnicity groups, primary care recorded ethnicity was within 2.9 percentage points of the population rate as recorded in the 2021 Census for England as a whole. For patients with multiple ethnicity records, 98.7% of the latest recorded ethnicities matched the most frequently coded ethnicity. Patients whose latest recorded ethnicity was categorised as Other were most likely to have a discordant ethnicity recording (32.2%). Conclusions Primary care ethnicity data in OpenSAFELY is present for over three quarters of all patients, and combined with data from other sources can achieve a high level of completeness. The overall distribution of ethnicities across all English OpenSAFELY-TPP practices was similar to the 2021 Census, with some regional variation. This report identifies the best available codelist for use in OpenSAFELY and similar electronic health record data.