BMJ Open (Mar 2024)
Impact of inconsistent ethnicity recordings on estimates of inequality in child health and education data: a data linkage study of Child and Adolescent Mental Health Services in South London
Abstract
Objectives Ethnicity data are critical for identifying inequalities, but previous studies suggest that ethnicity is not consistently recorded between different administrative datasets. With researchers increasingly leveraging cross-domain data linkages, we investigated the completeness and consistency of ethnicity data in two linked health and education datasets.Design Cohort study.Setting South London and Maudsley NHS Foundation Trust deidentified electronic health records, accessed via Clinical Record Interactive Search (CRIS) and the National Pupil Database (NPD) (2007–2013).Participants N=30 426 children and adolescents referred to local Child and Adolescent Mental Health Services.Primary and secondary outcome measures Ethnicity data were compared between CRIS and the NPD. Associations between ethnicity as recorded from each source and key educational and clinical outcomes were explored with risk ratios.Results Ethnicity data were available for 79.3% from the NPD, 87.0% from CRIS, 97.3% from either source and 69.0% from both sources. Among those who had ethnicity data from both, the two data sources agreed on 87.0% of aggregate ethnicity categorisations overall, but with high levels of disagreement in Mixed and Other ethnic groups. Strengths of associations between ethnicity, educational attainment and neurodevelopmental disorder varied according to which data source was used to code ethnicity. For example, as compared with White pupils, a significantly higher proportion of Asian pupils achieved expected educational attainment thresholds only if ethnicity was coded from the NPD (RR=1.46, 95% CI 1.29 to 1.64), not if ethnicity was coded from CRIS (RR=1.11, 0.98 to 1.26).Conclusions Data linkage has the potential to minimise missing ethnicity data, and overlap in ethnicity categorisations between CRIS and the NPD was generally high. However, choosing which data source to primarily code ethnicity from can have implications for analyses of ethnicity, mental health and educational outcomes. Users of linked data should exercise caution in combining and comparing ethnicity between different data sources.