BMC Medical Informatics and Decision Making (Dec 2019)

Use of natural language processing to improve predictive models for imaging utilization in children presenting to the emergency department

  • Xingyu Zhang,
  • M. Fernanda Bellolio,
  • Pau Medrano-Gracia,
  • Konrad Werys,
  • Sheng Yang,
  • Prashant Mahajan

Journal volume & issue
Vol. 19, no. 1
pp. 1 – 13


Read online

Abstract Objective To examine the association between the medical imaging utilization and information related to patients’ socioeconomic, demographic and clinical factors during the patients’ ED visits; and to develop predictive models using these associated factors including natural language elements to predict the medical imaging utilization at pediatric ED. Methods Pediatric patients’ data from the 2012–2016 United States National Hospital Ambulatory Medical Care Survey was included to build the models to predict the use of imaging in children presenting to the ED. Multivariable logistic regression models were built with structured variables such as temperature, heart rate, age, and unstructured variables such as reason for visit, free text nursing notes and combined data available at triage. NLP techniques were used to extract information from the unstructured data. Results Of the 27,665 pediatric ED visits included in the study, 8394 (30.3%) received medical imaging in the ED, including 6922 (25.0%) who had an X-ray and 1367 (4.9%) who had a computed tomography (CT) scan. In the predictive model including only structured variables, the c-statistic was 0.71 (95% CI: 0.70–0.71) for any imaging use, 0.69 (95% CI: 0.68–0.70) for X-ray, and 0.77 (95% CI: 0.76–0.78) for CT. Models including only unstructured information had c-statistics of 0.81 (95% CI: 0.81–0.82) for any imaging use, 0.82 (95% CI: 0.82–0.83) for X-ray, and 0.85 (95% CI: 0.83–0.86) for CT scans. When both structured variables and free text variables were included, the c-statistics reached 0.82 (95% CI: 0.82–0.83) for any imaging use, 0.83 (95% CI: 0.83–0.84) for X-ray, and 0.87 (95% CI: 0.86–0.88) for CT. Conclusions Both CT and X-rays are commonly used in the pediatric ED with one third of the visits receiving at least one. Patients’ socioeconomic, demographic and clinical factors presented at ED triage period were associated with the medical imaging utilization. Predictive models combining structured and unstructured variables available at triage performed better than models using structured or unstructured variables alone, suggesting the potential for use of NLP in determining resource utilization.