BMJ Open (Sep 2021)

How to discriminate non-small cell lung cancer (NSCLC) cases from an Italian administrative database? A retrospective, secondary data use study for evaluating a novel algorithm performance

  • Ilaria Massa,
  • William Balzi,
  • Andrea Roncadori,
  • Valentina Danesi,
  • Silvia Manunta,
  • Nicola Gentili,
  • Angelo Delmonte,
  • Lucio Crinò,
  • Mattia Altini

DOI
https://doi.org/10.1136/bmjopen-2020-048188
Journal volume & issue
Vol. 11, no. 9

Abstract

Read online

Objectives To evaluate an algorithm developed for identifying non-small cell lung cancer (NSCLC) candidates among patients with lung cancer with a diagnosis International Classification of Diseases: ninth revision (ICD-9) 162.x code in administrative databases. Algorithm could then be applied for identifying the NSCLC population in order to assess the appropriateness and quality of care of the NSCLC care pathway.Design Algorithm discrimination capacity to select both NSCLC or non-NSCLC was carried out on a sample for which electronic health record (EHR) diagnosis was available. A bivariate frequency distribution and other measures were used to evaluate algorithm’s performances. Associations between possible factors potentially affecting algorithm accuracy were investigated.Setting Administrative databases used in a specific geographical area of Emilia-Romagna region, Italy.Participants Algorithm was carried out on patients aged >18 years, with a lung cancer diagnosis from January to December 2017 and resident in Emilia-Romagna region who have been hospitalised at IRST or in one of the hospitals placed in the Forlì-Cesena area and for which EHR diagnosis data were available.Outcome measures Overall accuracy, positive (PPV) and negative (NPV) predictive values, sensitivity and specificity, positive and negative likelihood ratios and diagnostic OR were calculated.Results A total of 430 patients were identified as lung cancer cases based on ICD-9 diagnosis. Focusing on the total incident cases (n=314), the algorithm had an overall accuracy of 82.8% with a sensitivity of 88.8%. The analysis confirmed a high level of PPV (90.2%), but lower specificity (53.7%) and NPV (50%). Higher length of stay seemed to be associated with a correct classification. Hospitalisation regimen and a supply of antiblastic therapy seemed to increase the level of PPV.Conclusion The algorithm demonstrated a strong validity for identifying NSCLC among patients with lung cancer in hospital administrative databases and can be used to investigate the quality of cancer care for this population.Trial registration number NCT04676321.