Frontiers in Public Health (Sep 2022)

Machine learning for identifying benign and malignant of thyroid tumors: A retrospective study of 2,423 patients

  • Yuan-yuan Guo,
  • Zhi-jie Li,
  • Chao Du,
  • Jun Gong,
  • Pu Liao,
  • Jia-xing Zhang,
  • Cong Shao

DOI
https://doi.org/10.3389/fpubh.2022.960740
Journal volume & issue
Vol. 10

Abstract

Read online

Thyroid tumors, one of the common tumors in the endocrine system, while the discrimination between benign and malignant thyroid tumors remains insufficient. The aim of this study is to construct a diagnostic model of benign and malignant thyroid tumors, in order to provide an emerging auxiliary diagnostic method for patients with thyroid tumors. The patients were selected from the Chongqing General Hospital (Chongqing, China) from July 2020 to September 2021. And peripheral blood, BRAFV600E gene, and demographic indicators were selected, including sex, age, BRAFV600E gene, lymphocyte count (Lymph#), neutrophil count (Neu#), neutrophil/lymphocyte ratio (NLR), platelet/lymphocyte ratio (PLR), red blood cell distribution width (RDW), platelets count (PLT), red blood cell distribution width—coefficient of variation (RDW–CV), alkaline phosphatase (ALP), and parathyroid hormone (PTH). First, feature selection was executed by univariate analysis combined with least absolute shrinkage and selection operator (LASSO) analysis. Afterward, we used machine learning algorithms to establish three types of models. The first model contains all predictors, the second model contains indicators after feature selection, and the third model contains patient peripheral blood indicators. The four machine learning algorithms include extreme gradient boosting (XGBoost), random forest (RF), light gradient boosting machine (LightGBM), and adaptive boosting (AdaBoost) which were used to build predictive models. A grid search algorithm was used to find the optimal parameters of the machine learning algorithms. A series of indicators, such as the area under the curve (AUC), were intended to determine the model performance. A total of 2,042 patients met the criteria and were enrolled in this study, and 12 variables were included. Sex, age, Lymph#, PLR, RDW, and BRAFV600E were identified as statistically significant indicators by univariate and LASSO analysis. Among the model we constructed, RF, XGBoost, LightGBM and AdaBoost with the AUC of 0.874 (95% CI, 0.841–0.906), 0.868 (95% CI, 0.834–0.901), 0.861 (95% CI, 0.826–0.895), and 0.837 (95% CI, 0.802–0.873) in the first model. With the AUC of 0.853 (95% CI, 0.818–0.888), 0.853 (95% CI, 0.818–0.889), 0.837 (95% CI, 0.800–0.873), and 0.832 (95% CI, 0.797–0.867) in the second model. With the AUC of 0.698 (95% CI, 0.651–0.745), 0.688 (95% CI, 0.639–0.736), 0.693 (95% CI, 0.645–0.741), and 0.666 (95% CI, 0.618–0.714) in the third model. Compared with the existing models, our study proposes a model incorporating novel biomarkers which could be a powerful and promising tool for predicting benign and malignant thyroid tumors.

Keywords