Monte Carlo cross-validation for a study with binary outcome and limited sample size

Guogen Shan

doi:10.1186/s12911-022-02016-z

BMC Medical Informatics and Decision Making (Oct 2022)

Monte Carlo cross-validation for a study with binary outcome and limited sample size

Guogen Shan

Affiliations

Guogen Shan: Department of Biostatistics, University of Florida

DOI: https://doi.org/10.1186/s12911-022-02016-z
Journal volume & issue: Vol. 22, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Cross-validation (CV) is a resampling approach to evaluate machine learning models when sample size is limited. The number of all possible combinations of folds for the training data, known as CV rounds, are often very small in leave-one-out CV. Alternatively, Monte Carlo cross-validation (MCCV) can be performed with a flexible number of simulations when computational resources are feasible for a study with limited sample size. We conduct extensive simulation studies to compare accuracy between MCCV and CV with the same number of simulations for a study with binary outcome (e.g., disease progression or not). Accuracy of MCCV is generally higher than CV although the gain is small. They have similar performance when sample size is large. Meanwhile, MCCV is going to provide reliable performance metrics as the number of simulations increases. Two real examples are used to illustrate the comparison between MCCV and CV.

Published in BMC Medical Informatics and Decision Making

ISSN: 1472-6947 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: http://bmcmedinformdecismak.biomedcentral.com

About the journal

Abstract

Keywords