Informatics in Medicine Unlocked (Jan 2023)

Systematic review of natural language processing for recurrent cancer detection from electronic medical records

  • Ekapob Sangariyavanich,
  • Wanchana Ponthongmak,
  • Amarit Tansawet,
  • Nawanan Theera-Ampornpunt,
  • Pawin Numthavaj,
  • Gareth J. McKay,
  • John Attia,
  • Ammarin Thakkinstian

Journal volume & issue
Vol. 41
p. 101326

Abstract

Read online

This systematic review was conducted to explore natural language processing (NLP) focusing on text representation techniques and algorithms used previously to identify recurrent cancer diagnoses from electronic medical records (EMR), and an assessment of their detection performance. Relevant studies were identified from PubMed, Scopus, ACM Digital Library, and IEEE databases since inception to August 18, 2022. Data, including text representation methods, model algorithms and performance, and type of clinical notes, were extracted from individual studies by two independent reviewers. Study risk of bias was assessed using the prediction model risk of bias assessment tool. Of the 412 studies identified, 17 were eligible for inclusion, with 15 representing models that were not externally validated. Three text representations were used: statistical, context-free, and contextual representations (bidirectional encoder representations from transformers (BERT) and its variants), from 12, 6, and 3 studies, respectively. The corresponding median harmonic precision and recall means (F1 scores) for these representations were 0.43, 0.87, and 0.72, respectively. The algorithms applied included rule-based, machine learning, and deep learning approaches with median F1 scores of 0.71, 0.43, and 0.76, respectively. In conclusion, this systematic review suggests that deep learning models that use PubMedBERT as a text representation perform best. These findings are clinically informative for the selection of appropriate approaches for the detection of recurrent cancer from electronic medical records.

Keywords