SAGE Open (Apr 2025)
Predicting Absenteeism at Workplace Using Machine Learning and Network Analysis
Abstract
Absenteeism at work, possibly leading to productivity loss in business, is related to various psychological, social, and economic factors. Since predicting absenteeism is involved with complex associations of such factors, appropriately utilizing machine learning algorithms is required in the analysis. Statistical pre-processing and applications of machine learning methods have developed the comprehensive analysis of massive social data for absenteeism. The aim of this study is to develop a quantitative approach to identify the associations of factors and classify the absenteeism by including the effect of factors in the high-dimensional data. This approach implements association analysis including odds ratio test and network analysis, and supervised learning with imbalanced classification with random forest, application of principal component analysis and penalized regression methods. The dataset in this study includes records of various types of absenteeism at workplace from July 2007 to July 2010 in Brazil. Our study shows that there exist strongly interacting factors and that specific factors are strongly associated with absenteeism. The proposed method is validated on publicly available data sets using random forest and penalized regression with k-fold cross validation in order to strengthen better generalizability. One of major findings in this study is to elucidate the associations of factors affecting absenteeism. Application to similarly structured social data improves the understanding of the complex interplay between social factors and absenteeism that are important for people analytics which can help organizations resolve management difficulties.