IEEE Access (Jan 2018)

Comparing Prediction Performance for Crash Injury Severity Among Various Machine Learning and Statistical Methods

  • Jian Zhang,
  • Zhibin Li,
  • Ziyuan Pu,
  • Chengcheng Xu

DOI
https://doi.org/10.1109/ACCESS.2018.2874979
Journal volume & issue
Vol. 6
pp. 60079 – 60087

Abstract

Read online

Crash injury severity prediction is a promising research target in traffic safety. Traditionally, various statistical methods were used for modeling crash injury severities. In recent years, machine learningbased methods are becoming popular due to their good predictive performance. However, the machine learning-based models are usually criticized as they perform like a black-box. In this paper, we aim at comparing the predictive performance, including prediction accuracy and estimation of variable importance, among various machine learning and statistical methods with distinct modeling logic for crash severity analysis. The crash severity, road geometry, and traffic flow data were collected at freeway diverge areas in Florida. We estimated two most commonly used statistical methods which were ordered probit (OP) model and multinomial logit model, and four popular machine learning methods, including K-Nearest Neighbor, Decision Tree, Random Forest (RF), and Support Vector Machine. The correct prediction rate for each crash severity level and the overall correct prediction rate were calculated. The results showed that the machine learning methods had higher predicting accuracy than the statistical methods, though they suffered from overfitting issue. The RF method had the best prediction in overall and severe crashes while OP was the weakest one. We compared variable importance on crash severity via perturbation-based sensitivity analyses. The results showed that the inferences of variable importance from different methods were not always consistent and should be paid careful attention.

Keywords