Human Resources for Health (Dec 2018)

Application of machine learning models in predicting length of stay among healthcare workers in underserved communities in South Africa

  • Sangiwe Moyo,
  • Tuan Nguyen Doan,
  • Jessica Ann Yun,
  • Ndumiso Tshuma

DOI
https://doi.org/10.1186/s12960-018-0329-1
Journal volume & issue
Vol. 16, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Background Human resource planning in healthcare can employ machine learning to effectively predict length of stay of recruited health workers who are stationed in rural areas. While prior studies have identified a number of demographic factors related to general health practitioners’ decision to stay in public health practice, recruitment agencies have no validated methods to predict how long these health workers will commit to their placement. We aim to use machine learning methods to predict health professional’s length of practice in the rural public healthcare sector based on their demographic information. Methods Recruitment and retention data from Africa Health Placements was used to develop machine-learning models to predict health workers’ length of practice. A cross-validation technique was used to validate the models, and to evaluate which model performs better, based on their respective aggregated error rates of prediction. Length of stay was categorized into four groups for classification (less than 1 year, less than 2 years, less than 3 years, and more than 3 years). R, a statistical computing language, was used to train three machine learning models and apply 10-fold cross validation techniques in order to attain evaluative statistics. Results The three models attain almost identical results, with negligible difference in accuracy. The “best”-performing model (Multinomial logistic classifier) achieved a 47.34% [SD 1.63] classification accuracy while the decision tree model achieved an almost comparable 45.82% [SD 1.69]. The three models achieved an average AUC of approximately 0.66 suggesting sufficient predictive signal at the four categorical variables selected. Conclusions Machine-learning models give us a demonstrably effective tool to predict the recruited health workers’ length of practice. These models can be adapted in future studies to incorporate other information beside demographic details such as information about placement location and income. Beyond the scope of predicting length of practice, this modelling technique will also allow strategic planning and optimization of public healthcare recruitment.

Keywords