Comparing machine learning algorithms for multimorbidity prediction: An example from the Elsa-Brasil study.

Daniela Polessa Paula; Odaleia Barbosa Aguiar; Larissa Pruner Marques; Isabela Bensenor; Claudia Kimie Suemoto; Maria de Jesus Mendes da Fonseca; Rosane Härter Griep

doi:10.1371/journal.pone.0275619

PLoS ONE (Jan 2022)

Comparing machine learning algorithms for multimorbidity prediction: An example from the Elsa-Brasil study.

Daniela Polessa Paula,
Odaleia Barbosa Aguiar,
Larissa Pruner Marques,
Isabela Bensenor,
Claudia Kimie Suemoto,
Maria de Jesus Mendes da Fonseca,
Rosane Härter Griep

Affiliations

Daniela Polessa Paula
Odaleia Barbosa Aguiar
Larissa Pruner Marques
Isabela Bensenor
Claudia Kimie Suemoto
Maria de Jesus Mendes da Fonseca
Rosane Härter Griep

DOI: https://doi.org/10.1371/journal.pone.0275619
Journal volume & issue: Vol. 17, no. 10
p. e0275619

Abstract

Read online

BackgroundMultimorbidity is a worldwide concern related to greater disability, worse quality of life, and mortality. The early prediction is crucial for preventive strategies design and integrative medical practice. However, knowledge about how to predict multimorbidity is limited, possibly due to the complexity involved in predicting multiple chronic diseases.MethodsIn this study, we present the use of a machine learning approach to build cost-effective multimorbidity prediction models. Based on predictors easily obtainable in clinical practice (sociodemographic, clinical, family disease history and lifestyle), we build and compared the performance of seven multilabel classifiers (multivariate random forest, and classifier chain, binary relevance and binary dependence, with random forest and support vector machine as base classifiers), using a sample of 15105 participants from the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil). We developed a web application for the building and use of prediction models.ResultsClassifier chain with random forest as base classifier performed better (accuracy = 0.34, subset accuracy = 0.15, and Hamming Loss = 0.16). For different feature sets, random forest based classifiers outperformed those based on support vector machine. BMI, blood pressure, sex, and age were the features most relevant to multimorbidity prediction.ConclusionsOur results support the choice of random forest based classifiers for multimorbidity prediction.

Published in PLoS ONE

ISSN: 1932-6203 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Medicine; Science
Website: https://journals.plos.org/plosone/

About the journal