An ensemble learning with active sampling to predict the prognosis of postoperative non-small cell lung cancer patients

Danqing Hu; Huanyao Zhang; Shaolei Li; Huilong Duan; Nan Wu; Xudong Lu

doi:10.1186/s12911-022-01960-0

BMC Medical Informatics and Decision Making (Sep 2022)

An ensemble learning with active sampling to predict the prognosis of postoperative non-small cell lung cancer patients

Danqing Hu,
Huanyao Zhang,
Shaolei Li,
Huilong Duan,
Nan Wu,
Xudong Lu

Affiliations

Danqing Hu: College of Biomedical Engineering and Instrument Science, Zhejiang University
Huanyao Zhang: College of Biomedical Engineering and Instrument Science, Zhejiang University
Shaolei Li: Department of Thoracic Surgery II, Peking University Cancer Hospital and Institute
Huilong Duan: College of Biomedical Engineering and Instrument Science, Zhejiang University
Nan Wu: Department of Thoracic Surgery II, Peking University Cancer Hospital and Institute
Xudong Lu: College of Biomedical Engineering and Instrument Science, Zhejiang University

DOI: https://doi.org/10.1186/s12911-022-01960-0
Journal volume & issue: Vol. 22, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background Lung cancer is the leading cause of cancer death worldwide. Prognostic prediction plays a vital role in the decision-making process for postoperative non-small cell lung cancer (NSCLC) patients. However, the high imbalance ratio of prognostic data limits the development of effective prognostic prediction models. Methods In this study, we present a novel approach, namely ensemble learning with active sampling (ELAS), to tackle the imbalanced data problem in NSCLC prognostic prediction. ELAS first applies an active sampling mechanism to query the most informative samples to update the base classifier to give it a new perspective. This training process is repeated until no enough samples are queried. Next, an internal validation set is employed to evaluate the base classifiers, and the ones with the best performances are integrated as the ensemble model. Besides, we set up multiple initial training data seeds and internal validation sets to ensure the stability and generalization of the model. Results We verified the effectiveness of the ELAS on a real clinical dataset containing 1848 postoperative NSCLC patients. Experimental results showed that the ELAS achieved the best averaged 0.736 AUROC value and 0.453 AUPRC value for 6 prognostic tasks and obtained significant improvements in comparison with the SVM, AdaBoost, Bagging, SMOTE and TomekLinks. Conclusions We conclude that the ELAS can effectively alleviate the imbalanced data problem in NSCLC prognostic prediction and demonstrates good potential for future postoperative NSCLC prognostic prediction.

Published in BMC Medical Informatics and Decision Making

ISSN: 1472-6947 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: http://bmcmedinformdecismak.biomedcentral.com

About the journal

Abstract

Keywords