Scientific Reports (May 2023)
Embracing cohort heterogeneity in clinical machine learning development: a step toward generalizable models
Abstract
Abstract This study is a simple illustration of the benefit of averaging over cohorts, rather than developing a prediction model from a single cohort. We show that models trained on data from multiple cohorts can perform significantly better in new settings than models based on the same amount of training data but from just a single cohort. Although this concept seems simple and obvious, no current prediction model development guidelines recommend such an approach.