Revista de Saúde Pública ()

Prediction of absenteeism in public schools teachers with machine learning

  • Fernando Timoteo Fernandes,
  • Alexandre Dias Porto Chiavegatto Filho

DOI
https://doi.org/10.11606/s1518-8787.2021055002677

Abstract

Read online Read online

ABSTRACT OBJECTIVE To predict the risk of absence from work due to morbidities of teachers working in early childhood education in the municipal public schools, using machine learning algorithms. METHODS This is a cross-sectional study using secondary, public and anonymous data from the Relação Anual de Informações Sociais, selecting early childhood education teachers who worked in the municipal public schools of the state of São Paulo between 2014 and 2018 (n = 174,294). Data on the average number of students per class and number of inhabitants in the municipality were also linked. The data were separated into training and testing, using records from 2014 to 2016 (n = 103,357) to train five predictive models, and data from 2017 to 2018 (n = 70,937) to test their performance in new data. The predictive performance of the algorithms was evaluated using the value of the area under the ROC curve (AUROC). RESULTS All five algorithms tested showed an area under the curve above 0.76. The algorithm with the best predictive performance (artificial neural networks) achieved 0.79 of area under the curve, with accuracy of 71.52%, sensitivity of 72.86%, specificity of 70.52%, and kappa of 0.427 in the test data. CONCLUSION It is possible to predict cases of sickness absence in teachers of public schools with machine learning using public data. The best algorithm showed a better result of the area under the curve when compared with the reference model (logistic regression). The algorithms can contribute to more assertive predictions in the public health and worker health areas, allowing to monitor and help prevent the absence of these workers due to morbidity.

Keywords