Scientific Reports (Dec 2021)

Electronic case report forms generation from pathology reports by ARGO, automatic record generator for onco-hematology

  • Gian Maria Zaccaria,
  • Vito Colella,
  • Simona Colucci,
  • Felice Clemente,
  • Fabio Pavone,
  • Maria Carmela Vegliante,
  • Flavia Esposito,
  • Giuseppina Opinto,
  • Anna Scattone,
  • Giacomo Loseto,
  • Carla Minoia,
  • Bernardo Rossini,
  • Angela Maria Quinto,
  • Vito Angiulli,
  • Luigi Alfredo Grieco,
  • Angelo Fama,
  • Simone Ferrero,
  • Riccardo Moia,
  • Alice Di Rocco,
  • Francesca Maria Quaglia,
  • Valentina Tabanelli,
  • Attilio Guarini,
  • Sabino Ciavarella

DOI
https://doi.org/10.1038/s41598-021-03204-z
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 11

Abstract

Read online

Abstract The unstructured nature of Real-World (RW) data from onco-hematological patients and the scarce accessibility to integrated systems restrain the use of RW information for research purposes. Natural Language Processing (NLP) might help in transposing unstructured reports into standardized electronic health records. We exploited NLP to develop an automated tool, named ARGO (Automatic Record Generator for Onco-hematology) to recognize information from pathology reports and populate electronic case report forms (eCRFs) pre-implemented by REDCap. ARGO was applied to hemo-lymphopathology reports of diffuse large B-cell, follicular, and mantle cell lymphomas, and assessed for accuracy (A), precision (P), recall (R) and F1-score (F) on internal (n = 239) and external (n = 93) report series. 326 (98.2%) reports were converted into corresponding eCRFs. Overall, ARGO showed high performance in capturing (1) identification report number (all metrics > 90%), (2) biopsy date (all metrics > 90% in both series), (3) specimen type (86.6% and 91.4% of A, 98.5% and 100.0% of P, 92.5% and 95.5% of F, and 87.2% and 91.4% of R for internal and external series, respectively), (4) diagnosis (100% of P with A, R and F of 90% in both series). We developed and validated a generalizable tool that generates structured eCRFs from real-life pathology reports.