Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India

Susan Idicula-Thomas; Ulka Gawde; Prabhat Jha

doi:10.1186/s12889-021-11829-y

BMC Public Health (Oct 2021)

Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India

Susan Idicula-Thomas,
Ulka Gawde,
Prabhat Jha

Affiliations

Susan Idicula-Thomas: Biomedical Informatics Centre, Indian Council of Medical Research-National Institute for Research in Reproductive Health
Ulka Gawde: Biomedical Informatics Centre, Indian Council of Medical Research-National Institute for Research in Reproductive Health
Prabhat Jha: Centre for Global Health Research, St. Michael’s Hospital, Unity Health Toronto, and Dalla Lana School of Public Health, University of Toronto

DOI: https://doi.org/10.1186/s12889-021-11829-y
Journal volume & issue: Vol. 21, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Background Machine learning (ML) algorithms have been successfully employed for prediction of outcomes in clinical research. In this study, we have explored the application of ML-based algorithms to predict cause of death (CoD) from verbal autopsy records available through the Million Death Study (MDS). Methods From MDS, 18826 unique childhood deaths at ages 1–59 months during the time period 2004–13 were selected for generating the prediction models of which over 70% of deaths were caused by six infectious diseases (pneumonia, diarrhoeal diseases, malaria, fever of unknown origin, meningitis/encephalitis, and measles). Six popular ML-based algorithms such as support vector machine, gradient boosting modeling, C5.0, artificial neural network, k-nearest neighbor, classification and regression tree were used for building the CoD prediction models. Results SVM algorithm was the best performer with a prediction accuracy of over 0.8. The highest accuracy was found for diarrhoeal diseases (accuracy = 0.97) and the lowest was for meningitis/encephalitis (accuracy = 0.80). The top signs/symptoms for classification of these CoDs were also extracted for each of the diseases. A combination of signs/symptoms presented by the deceased individual can effectively lead to the CoD diagnosis. Conclusions Overall, this study affirms that verbal autopsy tools are efficient in CoD diagnosis and that automated classification parameters captured through ML could be added to verbal autopsies to improve classification of causes of death.

Published in BMC Public Health

ISSN: 1471-2458 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Public aspects of medicine
Website: https://bmcpublichealth.biomedcentral.com

About the journal

Abstract

Keywords