BMC Medical Informatics and Decision Making (Oct 2022)

Monte Carlo cross-validation for a study with binary outcome and limited sample size

  • Guogen Shan

DOI
https://doi.org/10.1186/s12911-022-02016-z
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Cross-validation (CV) is a resampling approach to evaluate machine learning models when sample size is limited. The number of all possible combinations of folds for the training data, known as CV rounds, are often very small in leave-one-out CV. Alternatively, Monte Carlo cross-validation (MCCV) can be performed with a flexible number of simulations when computational resources are feasible for a study with limited sample size. We conduct extensive simulation studies to compare accuracy between MCCV and CV with the same number of simulations for a study with binary outcome (e.g., disease progression or not). Accuracy of MCCV is generally higher than CV although the gain is small. They have similar performance when sample size is large. Meanwhile, MCCV is going to provide reliable performance metrics as the number of simulations increases. Two real examples are used to illustrate the comparison between MCCV and CV.

Keywords