Data in Brief (Dec 2021)

Administrative healthcare data to predict performance status in lung cancer patients

  • Anita Andreano,
  • Antonio Giampiero Russo

Journal volume & issue
Vol. 39
p. 107559

Abstract

Read online

The dataset includes 4488 patients diagnosed with lung cancer (ICD-O 3[3], C33-C34) between 2010–2012 and 2016–2018 in the territory of the Agency for Health Protection (ATS) of Milan, Italy, and selected from its population cancer registry on the basis of availability of the following information: performance status (PS), age, sex, and stage at diagnosis. The dataset includes also the following variables, extracted from the health databases of the ATS and linked to the variables derived from the cancer registry through deterministic record linkage on a unique key (tax code): Charlson comorbidity index, presence of chronic obstructive pulmonary disease, number of hospitalizations, outpatient visits, emergency accesses and prescribed drugs in the previous year, and dispensed durable medical equipment in the previous three years. The dataset was used to develop a logistic prediction model for PS, dichotomized as ‘poor’ (ECOG, 3–5) and ‘good’ (ECOG, 0–2), on the basis of all other variables in the dataset. The prediction model was developed on a 50% random subsample of the described dataset (development dataset, n = 2,244) and validated on the remaining half. The area under the curve (AUC) of the model in the development and validation samples were 0.76 and 0.73, respectively. The developed model was used to predict ‘good’ vs. ‘poor’ PS in a sample of patients with advanced lung cancer, from the same registry and years, for which the information was not available. Researchers using registry data, or electronic claims, to perform studies of oncologic therapy effectiveness for lung cancer could use the reported coefficients to predict PS value, dichotomized as ‘good’ or ‘poor’.

Keywords