Zhongguo quanke yixue (Jan 2022)

Using Machine Learning to Build an Early Warning Model for the Risk of Severe Airflow Limitation in Patients with Chronic Obstructive Pulmonary Disease

  • ZHOU Lijuan, WEN Xianxiu, LYU Qin, JIANG Rong, WU Xingwei, ZHOU Huangyuan, XIANG Chao

DOI
https://doi.org/10.12114/j.issn.1007-9572.2021.01.313
Journal volume & issue
Vol. 25, no. 02
pp. 217 – 226

Abstract

Read online

BackgroundThe degree of airflow limitation is a key indicator of the progression degree in COPD patients. However, problems such as contraindications to testing and compliance make it difficult for some patients to undergo the relevant tests and evaluate the severity of the disease.ObjectiveTo develop and evaluate a machine learning algorithm-based early warning model for the risk of severe airflow limitation in COPD patients.MethodsA cross-sectional design was used to investigate COPD inpatients in a tertiary hospital in Sichuan Province from 2019-01 to 2020-06. General clinical indexes and pulmonary function test data were collected. The data were randomly divided into training and test sets in the ratio of 8∶2, and 216 risk warning models were constructed in the training set using four missing value filling methods, three feature screening methods, 17 machine learning and one integrated learning algorithm. The area under the ROC curve (AUC) , accuracy, precision, recall and F1 score were used to evaluate the predictive performance of the model; and the ten-fold cross-validation method and Bootstrapping were used for internal and external validation, respectively. The test set data was used for model testing and selection, the posterior method was used for sample size verification.ResultsA total of 418 patients were included, of which 212 (50.7%) patients were at risk of severe airflow limitation. After four missing value treatments and three feature filters, a total of 12 processed datasets and the importance ranking of 12 factors affecting airflow limitation were obtained, and the results showed that modified medical research council dyspnea scale grade (mMRC) , age, body mass index (BMI) , smoking history (yes, no) , chronic obstructive pulmonary disease assessment test (CAT) score, and dyspnea (yes, no) were at the forefront inthe ranking of variable features and were key indicators for constructing the model, which had an important role in predicting the outcome. Using unfilled, Lasso screening, mMRC grade, smoking history (yes, no) , and dyspnea (yes, no) were the top 3 predictors, with mMRC grade accounting for 54.15% of feature importance. In which, using unfilled, Boruta screening, CAT score, age, and mMRC class were the top 3 predictors, and CAT score accounted for 26.64% of feature importance. A total of 216 prediction models were obtained using 17 machine learning algorithms and 1 integrated learning for each of the 12 datasets. 17 machine learning algorithms with 10-fold cross-validation showed that the differences were statistically significant (P<0.05) when comparing the prediction performance of different algorithms, and the average AUC of the stochastic gradient descent algorithm was maximum (0.738±0.089) . The results of external validation of the test set using the Bootstrapping algorithm showed that the differences were statistically significant (P<0.05) when comparing the prediction performance of the models obtained by different algorithms, and the average AUC of the integrated learning algorithm was maximum (0.757±0.057) . Evaluation of the prediction performance of four missing value treatments and three feature filters using the Bootstrapping algorithm showed that the performance of the model was improved when no padding and Lasso filtering were applied, with a statistically significant difference (P<0.05) . Using the test set data for 216 machine learning models, the best model had an AUC of 0.790 9, accuracy of 75.90%, precision of 75.00%, recall of 78.57%, and F1 value of 0.767 4. The sample size validation results suggested that the study sample size can meet the modeling needs.ConclusionIn this study, a risk warning model for severe airflow limitation in COPD patients was developed and evaluated. mMRC class, age, BMI, CAT score, presence of smoking history and dyspnea were the key indicators affecting airflow limitation. The model has good predictive effect and has potential clinical application.

Keywords