Improving lung cancer risk stratification leveraging whole transcriptome RNA sequencing and machine learning across multiple cohorts

Yoonha Choi; Jianghan Qu; Shuyang Wu; Yangyang Hao; Jiarui Zhang; Jianchang Ning; Xinwu Yang; Lori Lofaro; Daniel G. Pankratz; Joshua Babiarz; P. Sean Walsh; Ehab Billatos; Marc E. Lenburg; Giulia C. Kennedy; Jon McAuliffe; Jing Huang

doi:10.1186/s12920-020-00782-1

BMC Medical Genomics (Oct 2020)

Improving lung cancer risk stratification leveraging whole transcriptome RNA sequencing and machine learning across multiple cohorts

Yoonha Choi,
Jianghan Qu,
Shuyang Wu,
Yangyang Hao,
Jiarui Zhang,
Jianchang Ning,
Xinwu Yang,
Lori Lofaro,
Daniel G. Pankratz,
Joshua Babiarz,
P. Sean Walsh,
Ehab Billatos,
Marc E. Lenburg,
Giulia C. Kennedy,
Jon McAuliffe,
Jing Huang

Affiliations

Yoonha Choi: Veracyte, Inc.
Jianghan Qu: Veracyte, Inc.
Shuyang Wu: Veracyte, Inc.
Yangyang Hao: Veracyte, Inc.
Jiarui Zhang: Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine
Jianchang Ning: Veracyte, Inc.
Xinwu Yang: Veracyte, Inc.
Lori Lofaro: Veracyte, Inc.
Daniel G. Pankratz: Veracyte, Inc.
Joshua Babiarz: Veracyte, Inc.
P. Sean Walsh: Veracyte, Inc.
Ehab Billatos: Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine
Marc E. Lenburg: Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine
Giulia C. Kennedy: Veracyte, Inc.
Jon McAuliffe: Department of Statistics, University of California, Berkeley
Jing Huang: Veracyte, Inc.

DOI: https://doi.org/10.1186/s12920-020-00782-1
Journal volume & issue: Vol. 13, no. S10
pp. 1 – 15

Abstract

Read online

Abstract Background Bronchoscopy for suspected lung cancer has low diagnostic sensitivity, rendering many inconclusive results. The Bronchial Genomic Classifier (BGC) was developed to help with patient management by identifying those with low risk of lung cancer when bronchoscopy is inconclusive. The BGC was trained and validated on patients in the Airway Epithelial Gene Expression in the Diagnosis of Lung Cancer (AEGIS) trials. A modern patient cohort, the BGC Registry, showed differences in key clinical factors from the AEGIS cohorts, with less smoking history, smaller nodules and older age. Additionally, we discovered interfering factors (inhaled medication and sample collection timing) that impacted gene expressions and potentially disguised genomic cancer signals. Methods In this study, we leveraged multiple cohorts and next generation sequencing technology to develop a robust Genomic Sequencing Classifier (GSC). To address demographic composition shift and interfering factors, we synergized three algorithmic strategies: 1) ensemble of clinical dominant and genomic dominant models; 2) development of hierarchical regression models where the main effects from clinical variables were regressed out prior to the genomic impact being fitted in the model; and 3) targeted placement of genomic and clinical interaction terms to stabilize the effect of interfering factors. The final GSC model uses 1232 genes and four clinical covariates – age, pack-years, inhaled medication use, and specimen collection timing. Results In the validation set (N = 412), the GSC down-classified low and intermediate pre-test risk subjects to very low and low post-test risk with a specificity of 45% (95% CI 37–53%) and a sensitivity of 91% (95%CI 81–97%), resulting in a negative predictive value of 95% (95% CI 89–98%). Twelve percent of intermediate pre-test risk subjects were up-classified to high post-test risk with a positive predictive value of 65% (95%CI 44–82%), and 27% of high pre-test risk subjects were up-classified to very high post-test risk with a positive predictive value of 91% (95% CI 78–97%). Conclusions The GSC overcame the impact of interfering factors and achieved consistent performance across multiple cohorts. It demonstrated diagnostic accuracy in both down- and up-classification of cancer risk, providing physicians actionable information for many patients with inconclusive bronchoscopy.

Published in BMC Medical Genomics

ISSN: 1755-8794 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine; Science: Biology (General): Genetics
Website: https://bmcmedgenomics.biomedcentral.com

About the journal

Abstract

Keywords