Comparing Prediction Performance for Crash Injury Severity Among Various Machine Learning and Statistical Methods

Jian Zhang; Zhibin Li; Ziyuan Pu; Chengcheng Xu

doi:10.1109/ACCESS.2018.2874979

IEEE Access (Jan 2018)

Comparing Prediction Performance for Crash Injury Severity Among Various Machine Learning and Statistical Methods

Jian Zhang,
Zhibin Li,
Ziyuan Pu,
Chengcheng Xu

Affiliations

Jian Zhang: School of Transportation, Southeast University, Nanjing, China
Zhibin Li: ORCiD; School of Transportation, Southeast University, Nanjing, China
Ziyuan Pu: Department of Civil and Environmental Engineering, University of Washington, Seattle, WA, USA
Chengcheng Xu: School of Transportation, Southeast University, Nanjing, China

DOI: https://doi.org/10.1109/ACCESS.2018.2874979
Journal volume & issue: Vol. 6
pp. 60079 – 60087

Abstract

Read online

Crash injury severity prediction is a promising research target in traffic safety. Traditionally, various statistical methods were used for modeling crash injury severities. In recent years, machine learningbased methods are becoming popular due to their good predictive performance. However, the machine learning-based models are usually criticized as they perform like a black-box. In this paper, we aim at comparing the predictive performance, including prediction accuracy and estimation of variable importance, among various machine learning and statistical methods with distinct modeling logic for crash severity analysis. The crash severity, road geometry, and traffic flow data were collected at freeway diverge areas in Florida. We estimated two most commonly used statistical methods which were ordered probit (OP) model and multinomial logit model, and four popular machine learning methods, including K-Nearest Neighbor, Decision Tree, Random Forest (RF), and Support Vector Machine. The correct prediction rate for each crash severity level and the overall correct prediction rate were calculated. The results showed that the machine learning methods had higher predicting accuracy than the statistical methods, though they suffered from overfitting issue. The RF method had the best prediction in overall and severe crashes while OP was the weakest one. We compared variable importance on crash severity via perturbation-based sensitivity analyses. The results showed that the inferences of variable importance from different methods were not always consistent and should be paid careful attention.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords