JMIR Medical Informatics (Jun 2022)
Prevalence of Sensitive Terms in Clinical Notes Using Natural Language Processing Techniques: Observational Study
Abstract
BackgroundWith the increased sharing of electronic health information as required by the US 21st Century Cures Act, there is an increased risk of breaching patient, parent, or guardian confidentiality. The prevalence of sensitive terms in clinical notes is not known. ObjectiveThe aim of this study is to define sensitive terms that represent the documentation of content that may be private and determine the prevalence and characteristics of provider notes that contain sensitive terms. MethodsUsing keyword expansion, we defined a list of 781 sensitive terms. We searched all provider history and physical, progress, consult, and discharge summary notes for patients aged 0-21 years written between January 1, 2019, and December 31, 2019, for a direct string match of sensitive terms. We calculated the prevalence of notes with sensitive terms and characterized clinical encounters and patient characteristics. ResultsSensitive terms were present in notes from every clinical context in all pediatric ages. Terms related to the mental health category were most used overall (254,975/1,338,297, 19.5%), but terms related to substance abuse and reproductive health were most common in patients aged 0-3 years. History and physical notes (19,854/34,771, 57.1%) and ambulatory progress notes (265,302/563,273, 47.1%) were most likely to include sensitive terms. The highest prevalence of notes with sensitive terms was found in pain management (950/1112, 85.4%) and child abuse (1092/1282, 85.2%) clinics. ConclusionsNotes containing sensitive terms are not limited to adolescent patients, specific note types, or certain specialties. Recognition of sensitive terms across all ages and clinical settings complicates efforts to protect patient and caregiver privacy in the era of information-blocking regulations.