Data consistency in the English Hospital Episodes Statistics database

Adrian Hopper; William K Gray; Jamie Day; Tim W R Briggs; Flavien Hardy; Johannes Heyl; Katie Tucker; Maria J Marchã; Jeremy Yates; Andrew Wheeler; Sue Eve-Jones

doi:10.1136/bmjhci-2022-100633

BMJ Health & Care Informatics (Feb 2022)

Data consistency in the English Hospital Episodes Statistics database

Adrian Hopper,
William K Gray,
Jamie Day,
Tim W R Briggs,
Flavien Hardy,
Johannes Heyl,
Katie Tucker,
Maria J Marchã,
Jeremy Yates,
Andrew Wheeler,
Sue Eve-Jones

Affiliations

Adrian Hopper: Ageing and Health, Guy`s and St Thomas` NHS Foundation Trust, London, UK
William K Gray: Getting It Right First Time, NHS England and NHS Improvement London, London, UK
Jamie Day: Getting It Right First Time, NHS England and NHS Improvement London, London, UK
Tim W R Briggs: Getting It Right First Time, NHS England and NHS Improvement London, London, UK
Flavien Hardy: Department of Physics and Astronomy, University College London, London, UK
Johannes Heyl: Getting It Right First Time, NHS England and NHS Improvement London, London, UK
Katie Tucker: Innovation and Intelligent Automation Unit, Royal Free London NHS Foundation Trust, London, UK
Maria J Marchã: Science and Technology Facilities Council Distributed Research Utilising Advanced Computing High Performance Computing Facility, London, UK
Jeremy Yates: Science and Technology Facilities Council Distributed Research Utilising Advanced Computing High Performance Computing Facility, London, UK
Andrew Wheeler: Getting It Right First Time, NHS England and NHS Improvement London, London, UK
Sue Eve-Jones: Getting It Right First Time, NHS England and NHS Improvement London, London, UK

DOI: https://doi.org/10.1136/bmjhci-2022-100633
Journal volume & issue: Vol. 29, no. 1

Abstract

Read online

Background To gain maximum insight from large administrative healthcare datasets it is important to understand their data quality. Although a gold standard against which to assess criterion validity rarely exists for such datasets, internal consistency can be evaluated. We aimed to identify inconsistencies in the recording of mandatory International Statistical Classification of Diseases and Related Health Problems, tenth revision (ICD-10) codes within the Hospital Episodes Statistics dataset in England.Methods Three exemplar medical conditions where recording is mandatory once diagnosed were chosen: autism, type II diabetes mellitus and Parkinson’s disease dementia. We identified the first occurrence of the condition ICD-10 code for a patient during the period April 2013 to March 2021 and in subsequent hospital spells. We designed and trained random forest classifiers to identify variables strongly associated with recording inconsistencies.Results For autism, diabetes and Parkinson’s disease dementia respectively, 43.7%, 8.6% and 31.2% of subsequent spells had inconsistencies. Coding inconsistencies were highly correlated with non-coding of an underlying condition, a change in hospital trust and greater time between the spell with the first coded diagnosis and the subsequent spell. For patients with diabetes or Parkinson’s disease dementia, the code recording for spells without an overnight stay were found to have a higher rate of inconsistencies.Conclusions Data inconsistencies are relatively common for the three conditions considered. Where these mandatory diagnoses are not recorded in administrative datasets, and where clinical decisions are made based on such data, there is potential for this to impact patient care.

Published in BMJ Health & Care Informatics

ISSN: 2632-1009 (Online)
Publisher: BMJ Publishing Group
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://informatics.bmj.com/

About the journal