Examining Linguistic Differences in Electronic Health Records for Diverse Patients With Diabetes: Natural Language Processing Analysis

Isabel Bilotta; Scott Tonidandel; Winston R Liaw; Eden King; Diana N Carvajal; Ayana Taylor; Julie Thamby; Yang Xiang; Cui Tao; Michael Hansen

doi:10.2196/50428

JMIR Medical Informatics (May 2024)

Examining Linguistic Differences in Electronic Health Records for Diverse Patients With Diabetes: Natural Language Processing Analysis

Isabel Bilotta,
Scott Tonidandel,
Winston R Liaw,
Eden King,
Diana N Carvajal,
Ayana Taylor,
Julie Thamby,
Yang Xiang,
Cui Tao,
Michael Hansen

Affiliations

Isabel Bilotta: ORCiD
Scott Tonidandel: ORCiD
Winston R Liaw: ORCiD
Eden King: ORCiD
Diana N Carvajal: ORCiD
Ayana Taylor: ORCiD
Julie Thamby: ORCiD
Yang Xiang: ORCiD
Cui Tao: ORCiD
Michael Hansen: ORCiD

DOI: https://doi.org/10.2196/50428
Journal volume & issue: Vol. 12
pp. e50428 – e50428

Abstract

Read online

Abstract BackgroundIndividuals from minoritized racial and ethnic backgrounds experience pernicious and pervasive health disparities that have emerged, in part, from clinician bias. ObjectiveWe used a natural language processing approach to examine whether linguistic markers in electronic health record (EHR) notes differ based on the race and ethnicity of the patient. To validate this methodological approach, we also assessed the extent to which clinicians perceive linguistic markers to be indicative of bias. MethodsIn this cross-sectional study, we extracted EHR notes for patients who were aged 18 years or older; had more than 5 years of diabetes diagnosis codes; and received care between 2006 and 2014 from family physicians, general internists, or endocrinologists practicing in an urban, academic network of clinics. The race and ethnicity of patients were defined as White non-HispanicBlack non-HispanicHispanic or Latino ResultsWe examined EHR notes (n=12,905) of Black non-Hispanic, White non-Hispanic, and Hispanic or Latino patients (n=1562), who were seen by 281 physicians. A total of 27 clinicians participated in the validation study. In terms of bias, participants rated negative adjectives as 8.63 (SD 2.06), fear and disgust words as 8.11 (SD 2.15), and positive adjectives as 7.93 (SD 2.46) on a scale of 1 to 10, with 10 being extremely indicative of bias. Notes for Black non-Hispanic patients contained significantly more negative adjectives (coefficient 0.07, SE 0.02) and significantly more fear and disgust words (coefficient 0.007, SE 0.002) than those for White non-Hispanic patients. The notes for Hispanic or Latino patients included significantly fewer positive adjectives (coefficient −0.02, SE 0.007), trust verbs (coefficient −0.009, SE 0.004), and joy words (coefficient −0.03, SE 0.01) than those for White non-Hispanic patients. ConclusionsThis approach may enable physicians and researchers to identify and mitigate bias in medical interactions, with the goal of reducing health disparities stemming from bias.

Published in JMIR Medical Informatics

ISSN: 2291-9694 (Online)
Publisher: JMIR Publications
Country of publisher: Canada
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://medinform.jmir.org

About the journal