Applied Sciences (Nov 2022)
Predicting Road Crash Severity Using Classifier Models and Crash Hotspots
Abstract
The rapid increase in traffic volume on urban roads, over time, has altered the global traffic scenario. Additionally, it has increased the number of road crashes, some of which are severe and fatal in nature. The identification of hazardous roadway sections using the spatial pattern analysis of crashes and recognition of the primary and contributing factors may assist in reducing the severity of road traffic crashes (R.T.C.s). For crash severity prediction, along with spatial patterns, various machine learning models are used, and the spatial relations of R.T.C.s with neighboring areas are evaluated. In this study, tree-based ensemble models (gradient boosting and random forest) and a logistic regression model are compared for the prediction of R.T.C. severity. Sample data of road crashes in Al-Ahsa, the eastern province of Saudi Arabia, were obtained from 2016 to 2018. Random forest (R.F.) identifies significant features strongly correlated with the severity of the R.T.C.s. The analysis findings showed that the cause of the crash and the type of collision are the most crucial elements affecting the severity of injuries in traffic crashes. Furthermore, the target-specific model interpretation results showed that distracted driving, speeding, and sudden lane changes significantly contributed to severe crashes. The random forest (R.F.) method surpassed other models in terms of injury severity, individual class accuracies, and collective prediction accuracy when using k-fold (k = 10) based on various performance metrics. In addition to taking into account the machine learning approach, this study also included spatial autocorrelation analysis based on G.I.S. for identifying crash hotspots, and Getis Ord Gi* statistics were devised to locate cluster zones with high- and low-severity crashes. The results demonstrated that the research area’s spatial dependence was very strong, and the spatial patterns were clustered with a distance threshold of 500 m. The analysis’s approaches, which included Getis Ord Gi*, the crash severity index, and the spatial autocorrelation of accident incidents according to Moran’s I, were found to be a successful way of locating and rating crash hotspots and crash severity. The techniques used in this study could be applied to large-scale crash data analysis while providing a useful tool for policymakers looking to improve roadway safety.
Keywords