The Lancet: Digital Health (Nov 2019)

Prediction of lung cancer risk at follow-up screening with low-dose CT: a training and validation study of a deep learning method

  • Peng Huang, PhD,
  • Cheng T Lin, MD,
  • Yuliang Li, MS,
  • Martin C Tammemagi, ProfPhD,
  • Malcolm V Brock, ProfMD,
  • Sukhinder Atkar-Khattra, BSc,
  • Yanxun Xu, PhD,
  • Ping Hu, ScD,
  • John R Mayo, ProfMD,
  • Heidi Schmidt, ProfMD,
  • Michel Gingras, MD,
  • Sergio Pasian, MD,
  • Lori Stewart, MD,
  • Scott Tsai, MD,
  • Jean M Seely, MD,
  • Daria Manos, MD,
  • Paul Burrowes, MD,
  • Rick Bhatia, MD,
  • Ming-Sound Tsao, ProfMD,
  • Stephen Lam, ProfMD

Journal volume & issue
Vol. 1, no. 7
pp. e353 – e362

Abstract

Read online

Summary: Background: Current lung cancer screening guidelines use either mean diameter, volume, or density of the largest lung nodule on the previous CT scan or appearance of a new nodule to ascertain the timing of the next CT scan. We aimed to develop an accurate screening protocol by estimating the 3-year lung cancer risk after two screening CT scans using deep learning of radiologists' CT readings and other universally available clinical information. Methods: A deep learning algorithm (referred to as DeepLR) was developed using data from participants who had received at least two CT screening scans up to 2 years apart in the National Lung Screening Trial (NLST; training cohort). Double-blinded validation was done using data from participants in the Pan-Canadian Early Detection of Lung Cancer (PanCan) study (validation cohort). The primary analysis was to compare accuracy of DeepLR scores to predict lung cancer incidence at 1 year, 2 years, and 3 years with the Lung CT Screening Reporting & Data System (Lung-RADS) and volume doubling time, using time-dependent area under the receiver operating characteristic curve (AUC) analysis. Findings: The training cohort consisted of 25 097 participants from NLST and the validation cohort comprised 2294 individuals from PanCan. In the validation cohort, DeepLR showed good discrimination, with 1-year, 2-year, and 3-year time-dependent AUC values for cancer diagnosis of 0·968 (SD 0·013), 0·946 (0·013), and 0·899 (0·017), respectively. Among individuals deemed high risk by DeepLR, 94%, 85%, and 71% of incident and interval lung cancers diagnosed within 1 year, 2 years, and 3 years, respectively, after the second screening CT scan were identified. Furthermore, individuals with high DeepLR scores had a significantly higher risk of mortality (hazard ratio 16·07, 95% CI 10·15–25·44; p<0·0001) among people with high scores on Lung-RADS. Interpretation: DeepLR recognises patterns in both temporal and spatial changes and synergy among changes in nodule and non-nodule features. DeepLR scores could be used to accurately guide clinical management after the next scheduled repeat screening CT scan. Funding: Allegheny Health Network, Johns Hopkins University, Terry Fox Research Institute, and British Columbia Cancer Foundation.