Trauma Surgery & Acute Care Open (Apr 2024)

Practical guide to building machine learning-based clinical prediction models using imbalanced datasets

  • Joseph D Forrester,
  • Jeff Choi,
  • Advait Patil,
  • Jacklyn Luu,
  • Evgenia Borisenko,
  • Valerie Przekop

DOI
https://doi.org/10.1136/tsaco-2023-001222
Journal volume & issue
Vol. 9, no. 1

Abstract

Read online

Clinical prediction models often aim to predict rare, high-risk events, but building such models requires robust understanding of imbalance datasets and their unique study design considerations. This practical guide highlights foundational prediction model principles for surgeon-data scientists and readers who encounter clinical prediction models, from feature engineering and algorithm selection strategies to model evaluation and design techniques specific to imbalanced datasets. We walk through a clinical example using readable code to highlight important considerations and common pitfalls in developing machine learning-based prediction models. We hope this practical guide facilitates developing and critically appraising robust clinical prediction models for the surgical community.