Scientific Reports (Sep 2024)

Efficacy of automated machine learning models and feature engineering for diagnosis of equivocal appendicitis using clinical and computed tomography findings

  • Juho An,
  • Il Seok Kim,
  • Kwang-Ju Kim,
  • Ji Hyun Park,
  • Hyuncheol Kang,
  • Hyuk Jung Kim,
  • Young Sik Kim,
  • Jung Hwan Ahn

DOI
https://doi.org/10.1038/s41598-024-72889-9
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 13

Abstract

Read online

Abstract This study evaluates the diagnostic efficacy of automated machine learning (AutoGluon) with automated feature engineering and selection (autofeat), focusing on clinical manifestations, and a model integrating both clinical manifestations and CT findings in adult patients with ambiguous computed tomography (CT) results for acute appendicitis (AA). This evaluation was compared with conventional single machine learning models such as logistic regression(LR) and established scoring systems such as the Adult Appendicitis Score(AAS) to address the gap in diagnostic approaches for uncertain AA cases. In this retrospective analysis of 303 adult patients with indeterminate CT findings, the cohort was divided into appendicitis (n = 115) and non-appendicitis (n = 188) groups. AutoGluon and autofeat were used for AA prediction. The AutoGluon-clinical model relied solely on clinical data, whereas the AutoGluon-clinical-CT model included both clinical and CT data. The area under the receiver operating characteristic curve (AUROC) and other metrics for the test dataset, namely accuracy, sensitivity, specificity, PPV, NPV, and F1 score, were used to compare AutoGluon models with single machine learning models and the AAS. The single ML models in this study were LR, LASSO regression, ridge regression, support vector machine, decision tree, random forest, and extreme gradient boosting. Feature importance values were extracted using the “feature_importance” attribute from AutoGluon. The AutoGluon-clinical model demonstrated an AUROC of 0.785 (95% CI 0.691–0.890), and the ridge regression model with only clinical data revealed an AUROC of 0.755 (95% CI 0.649–0.861). The AutoGluon-clinical-CT model (AUROC 0.886 with 95% CI 0.820–0.951) performed better than the ridge model using clinical and CT data (AUROC 0.852 with 95% CI 0.774–0.930, p = 0.029). A new feature, exp(-(duration from pain to CT)3 + rebound tenderness), was identified (importance = 0.049, p = 0.001). AutoML (AutoGluon) and autoFE (autofeat) enhanced the diagnosis of uncertain AA cases, particularly when combining CT and clinical findings. This study suggests the potential of integrating AutoML and autoFE in clinical settings to improve diagnostic strategies and patient outcomes and make more efficient use of healthcare resources. Moreover, this research supports further exploration of machine learning in diagnostic processes.

Keywords