Prediction of heart failure 1 year before diagnosis in general practitioner patients using machine learning algorithms: a retrospective case–control study

Frank C Bennis; Mark Hoogendoorn; Claire Aussems; Joke C Korevaar

doi:10.1136/bmjopen-2021-060458

BMJ Open (Aug 2022)

Prediction of heart failure 1 year before diagnosis in general practitioner patients using machine learning algorithms: a retrospective case–control study

Frank C Bennis,
Mark Hoogendoorn,
Claire Aussems,
Joke C Korevaar

Affiliations

Frank C Bennis: Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Mark Hoogendoorn: Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Claire Aussems: Netherlands Institute for Health Services Research (Nivel), Utrecht, The Netherlands
Joke C Korevaar: Netherlands Institute for Health Services Research (Nivel), Utrecht, The Netherlands

DOI: https://doi.org/10.1136/bmjopen-2021-060458
Journal volume & issue: Vol. 12, no. 8

Abstract

Read online

Objectives Heart failure (HF) is a commonly occurring health problem with high mortality and morbidity. If potential cases could be detected earlier, it may be possible to intervene earlier, which may slow progression in some patients. Preferably, it is desired to reuse already measured data for screening of all persons in an age group, such as general practitioner (GP) data. Furthermore, it is essential to evaluate the number of people needed to screen to find one patient using true incidence rates, as this indicates the generalisability in the true population. Therefore, we aim to create a machine learning model for the prediction of HF using GP data and evaluate the number needed to screen with true incidence rates.Design, settings and participants GP data from 8543 patients (−2 to −1 year before diagnosis) and controls aged 70+ years were obtained retrospectively from 01 January 2012 to 31 December 2019 from the Nivel Primary Care Database. Codes about chronic illness, complaints, diagnostics and medication were obtained. Data were split in a train/test set. Datasets describing demographics, the presence of codes (non-sequential) and upon each other following codes (sequential) were created. Logistic regression, random forest and XGBoost models were trained. Predicted outcome was the presence of HF after 1 year. The ratio case:control in the test set matched true incidence rates (1:45).Results Sole demographics performed average (area under the curve (AUC) 0.692, CI 0.677 to 0.706). Adding non-sequential information combined with a logistic regression model performed best and significantly improved performance (AUC 0.772, CI 0.759 to 0.785, p<0.001). Further adding sequential information did not alter performance significantly (AUC 0.767, CI 0.754 to 0.780, p=0.07). The number needed to screen dropped from 14.11 to 5.99 false positives per true positive.Conclusion This study created a model able to identify patients with pending HF a year before diagnosis.

Published in BMJ Open

ISSN: 2044-6055 (Online)
Publisher: BMJ Publishing Group
Country of publisher: United Kingdom
LCC subjects: Medicine
Website: https://bmjopen.bmj.com

About the journal