Scientific Reports (Dec 2022)

Crash severity analysis and risk factors identification based on an alternate data source: a case study of developing country

  • Hanif Bhuiyan,
  • Jinat Ara,
  • Khan Md. Hasib,
  • Md Imran Hossain Sourav,
  • Faria Benta Karim,
  • Cecilia Sik-Lanyi,
  • Guido Governatori,
  • Andry Rakotonirainy,
  • Shamsunnahar Yasmin

DOI
https://doi.org/10.1038/s41598-022-25361-5
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 22

Abstract

Read online

Abstract Road traffic injuries are one of the primary reasons for death, especially in developing countries like Bangladesh. Safety in land transport is one of the major concerns for road safety authorities and other policymakers. For this reason, contributory factors identification associated with crashes is necessary for reducing road crashes and ensuring transportation safety. This paper presents an analytical approach to identifying significant contributing factors of Bangladesh road crashes by evaluating the road crash data, considering three different severity levels (non-fetal, severe, and extremely severe). Generally, official crash databases are compiled from police-reported crash records. Though the official datasets are focusing on compiling a wide array of attributes, an assorted number of unreported issues can be observed that demands an alternative source of crash data. Therefore, this proposed approach considers compiling crash data from newspapers in Bangladesh which could be complimentary to the official crash database. To conduct the analysis, first, we filtered the useful features from compiled crash data using three popular feature selection techniques: chi-square, Two-way ANOVA, and Regression analysis. Then, we employed three machine learning classifiers: Decision Tree, Random Forest, and Naïve Bayes over the extracted features. A confusion matrix was considered to evaluate the proposed model, including classification accuracy, sensitivity, and specificity. The predictive machine learning model, namely, Random Forest using Label Encoder with chi-square and Two-way ANOVA feature selection process, seems the best option for crash severity prediction that provides high prediction accuracy. The resulting model highlights nine out of fourteen independent features as responsible factors. Significant features associated with crash severities include driver characteristics (gender, license type, seat belts), vehicle characteristics (vehicle type), road characteristics (road surface type, road classification), environmental conditions (day of crash occurred, time of crash), and injury localization. This outcome may contribute to improving traffic safety of Bangladesh.