Frontiers in Medicine (Nov 2022)

Predicting the risk of nodular thyroid disease in coal miners based on different machine learning models

  • Feng Zhao,
  • Hongzhen Zhang,
  • Danqing Cheng,
  • Wenping Wang,
  • Yongtian Li,
  • Yisong Wang,
  • Dekun Lu,
  • Chunhui Dong,
  • Dingfei Ren,
  • Lixin Yang

DOI
https://doi.org/10.3389/fmed.2022.1037944
Journal volume & issue
Vol. 9

Abstract

Read online

BackgroundNodular thyroid disease is by far the most common thyroid disease and is closely associated with the development of thyroid cancer. Coal miners with chronic coal dust exposure are at higher risk of developing nodular thyroid disease. There are few studies that use machine learning models to predict the occurrence of nodular thyroid disease in coal miners. The aim of this study was to predict the high risk of nodular thyroid disease in coal miners based on five different Machine learning (ML) models.MethodsThis is a retrospective clinical study in which 1,708 coal miners who were examined at the Huaihe Energy Occupational Disease Control Hospital in Anhui Province in April 2021 were selected and their clinical physical examination data, including general information, laboratory tests and imaging findings, were collected. A synthetic minority oversampling technique (SMOTE) was used for sample balancing, and the data set was randomly split into a training and Test dataset in a ratio of 8:2. Lasso regression and correlation heat map were used to screen the predictors of the models, and five ML models, including Extreme Gradient Augmentation (XGBoost), Logistic Classification (LR), Gaussian Parsimonious Bayesian Classification (GNB), Neural Network Classification (MLP), and Complementary Parsimonious Bayesian Classification (CNB) for their predictive efficacy, and the model with the highest AUC was selected as the optimal model for predicting the occurrence of nodular thyroid disease in coal miners.ResultLasso regression analysis showed Age, H-DLC, HCT, MCH, PLT, and GGT as predictor variables for the ML models; in addition, heat maps showed no significant correlation between the six variables. In the prediction of nodular thyroid disease, the AUC results of the five ML models, XGBoost (0.892), LR (0.577), GNB (0.603), MLP (0.601), and CNB (0.543), with the XGBoost model having the largest AUC, the model can be applied in clinical practice.ConclusionIn this research, all five ML models were found to predict the risk of nodular thyroid disease in coal miners, with the XGBoost model having the best overall predictive performance. The model can assist clinicians in quickly and accurately predicting the occurrence of nodular thyroid disease in coal miners, and in adopting individualized clinical prevention and treatment strategies.

Keywords