Comparing conventional statistical models and machine learning in a small cohort of South African cardiac patients

Preesha Premsagar; Colleen Aldous; Tonya M. Esterhuizen; Byron J. Gomes; Jason William Gaskell; David L. Tabb

Informatics in Medicine Unlocked (Jan 2022)

Comparing conventional statistical models and machine learning in a small cohort of South African cardiac patients

Preesha Premsagar,
Colleen Aldous,
Tonya M. Esterhuizen,
Byron J. Gomes,
Jason William Gaskell,
David L. Tabb

Affiliations

Preesha Premsagar: Department of Internal Medicine, Nelson R Mandela School of Medicine, University of Kwa-Zulu Natal, South Africa; Corresponding author.
Colleen Aldous: Nelson R Mandela School of Medicine, University of Kwa-Zulu Natal, South Africa
Tonya M. Esterhuizen: Division of Epidemiology/Biostatistics, Department of Global Health, Faculty of Medical and Health Sciences, Stellenbosch University, South Africa
Byron J. Gomes: Financial Mathematics and BSc Actuarial Science and Statistics, University of Witwatersrand, South Africa
Jason William Gaskell: B.Bus.Sci Actuarial Science Specialising in Actuarial Science, University of Cape Town, South Africa
David L. Tabb: Institut Pasteur, Université Paris Cité, CNRS UAR 2024, Mass Spectrometry for Biology Unit, 28 rue du Dr Roux, 75724, PARIS, Cedex 15, France; Centre for Bioinformatics and Computational Biology, Stellenbosch University, Stellenbosch, South Africa

Journal volume & issue: Vol. 34
p. 101103

Abstract

Read online

Background: Machine learning is used to process big data volumes with complex non-linear relationships between predictive variables and predictions. Research into the usefulness of machine learning in small data volumes remains limited. Aim: To compare conventional statistical methods and machine learning to predict angiogram outcomes in a small cohort of South African cardiac patients. Methods: This is a retrospective study on patients with cardiac risk factors at Inkosi Albert Luthuli Central Hospital, Durban, South Africa, from 2002 to 2008. Models were designed using predictive risk factors to forecast a binary angiogram outcome (normal or abnormal) by applying conventional statistical models (binary logistic and log binomial) and stacking ensemble machine learning. Results: The outcome prevalence of abnormal angiograms was 99/173 (57%). Predictive data was used to model this outcome. The binary logistic regression model, which estimates odds ratio, was unsuitable. The log binomial model, which estimates relative risk, did not converge after various stepwise modelling attempts. Thereafter, machine learning models were used. These included logistic regression, k-nearest neighbour, decision tree, support vector machine, and naïve Bayes. The ensemble model amalgamated all algorithms and showed accuracy >70% and excellent performance at different thresholds with an area under the curve (AUC) > 80%. Discussion: The logistic regression model was unsuitable because an odds ratio would have been unreliable and overestimated the true effect since the outcome prevalence was >10%. A log binomial model with relative risk estimates did not converge, possibly owing to the multiple predictive variables. Overall, conventional statistical models were unsuccessful in this instance. Machine learning models had limitations from a small dataset. However, the combined modelling with the stacking ensemble method produced good results in the small, homogenous database by exploiting the strengths of each contributing algorithm. Conclusions: Researchers may apply machine learning when conventional statistical models are inconclusive in homogenous small databases with multiple variables and a complex relationship to the outcome. Machine learning is a viable option even with relatively small cohorts if the number of predictive variables is also small.

Published in Informatics in Medicine Unlocked

ISSN: 2352-9148 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.journals.elsevier.com/informatics-in-medicine-unlocked/

About the journal

Abstract

Keywords