Scientific Reports (Jun 2024)

Applying an explainable machine learning model might reduce the number of negative appendectomies in pediatric patients with a high probability of acute appendicitis

  • Ivan Males,
  • Zvonimir Boban,
  • Marko Kumric,
  • Josip Vrdoljak,
  • Karlotta Berkovic,
  • Zenon Pogorelic,
  • Josko Bozic

DOI
https://doi.org/10.1038/s41598-024-63513-x
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 13

Abstract

Read online

Abstract The diagnosis of acute appendicitis and concurrent surgery referral is primarily based on clinical presentation, laboratory and radiological imaging. However, utilizing such an approach results in as much as 10–15% of negative appendectomies. Hence, in the present study, we aimed to develop a machine learning (ML) model designed to reduce the number of negative appendectomies in pediatric patients with a high clinical probability of acute appendicitis. The model was developed and validated on a registry of 551 pediatric patients with suspected acute appendicitis that underwent surgical treatment. Clinical, anthropometric, and laboratory features were included for model training and analysis. Three machine learning algorithms were tested (random forest, eXtreme Gradient Boosting, logistic regression) and model explainability was obtained. Random forest model provided the best predictions achieving mean specificity and sensitivity of 0.17 ± 0.01 and 0.997 ± 0.001 for detection of acute appendicitis, respectively. Furthermore, the model outperformed the appendicitis inflammatory response (AIR) score across most sensitivity–specificity combinations. Finally, the random forest model again provided the best predictions for discrimination between complicated appendicitis, and either uncomplicated acute appendicitis or no appendicitis at all, with a joint mean sensitivity of 0.994 ± 0.002 and specificity of 0.129 ± 0.009. In conclusion, the developed ML model might save as much as 17% of patients with a high clinical probability of acute appendicitis from unnecessary surgery, while missing the needed surgery in only 0.3% of cases. Additionally, it showed better diagnostic accuracy than the AIR score, as well as good accuracy in predicting complicated acute appendicitis over uncomplicated and negative cases bundled together. This may be useful in centers that advocate for the conservative treatment of uncomplicated appendicitis. Nevertheless, external validation is needed to support these findings.

Keywords