Scientific Reports (Apr 2024)
Automatic identification of incidents involving potential serious injuries and fatalities (PSIF)
Abstract
Abstract Safety incidents have always been a crucial risk in work spaces, especially industrial sites. In the last few decades, significant efforts have been dedicated to incident control measures to reduce the rate of safety incidents. Despite all these efforts, the rate of decline in serious injuries and fatalities (SIFs) has been considerably lower than the rate of decline for non-critical incidents. This observation has led to a change of risk reduction paradigm for safety incidents. Under the new paradigm, more focus has been allocated to reducing the rate of critical/SIF incidents, as opposed to reducing the count of all incidents. One of the challenges in reducing the number of SIF incidents is the proper identification of the risk prior to materialization. One of the reasons for risk identification being a challenge is that companies usually only focus on incidents where SIF did occur reactively, and incidents that did not cause SIF but had the potential to do so go unnoticed. Identifying these potentially significant incidents, referred to as potential serious injuries and fatalities (PSIF), would enable companies to work on identifying critical risk and taking steps to prevent them preemptively. However, flagging PSIF incidents requires all incident reports to be analyzed individually by experts and hence significant investment, which is often not affordable, especially for small and medium sized companies. This study is aimed at addressing this problem through machine learning powered automation. We propose a novel approach based on binary classification for the identification of such incidents involving PSIF (potential serious injuries and fatalities). This is the first work towards automatic risk identification from incident reports. Our approach combines a pre-trained transformer model with XGBoost. We utilize advanced natural language processing techniques to encode an incident record comprising heterogeneous fields into a vector representation fed to XGBoost for classification. Moreover, given the scarcity of manually labeled incident records available for training, we leverage weak labeling to augment the label coverage of the training data. We utilize the F2 metric for hyperparameter tuning using Tree-structured Parzen Estimator to prioritize the detection of PSIF records over the avoidance of non-PSIF records being mis-classified as PSIF. The proposed methods outperform several baselines from other studies on a significantly large test dataset.
Keywords