International Journal of Transportation Science and Technology (Jun 2024)
Application of machine learning models and SHAP to examine crashes involving young drivers in New Jersey
Abstract
Motor vehicle crashes are the leading cause of the death of teenagers in the United States. Young drivers have shown a higher propensity to get involved in crashes due to using a cellphone while driving, breaking the speed limit, and reckless driving. This study analyzed motor vehicle crashes involving young drivers using New Jersey crash data. Specifically, four years of crash data (2016–2019) were gathered and analyzed. Different machine learning (ML) methods, such as Random Forest, Light GBM, Catboost, and XGBoost, were used to predict the injury severity. The performance of the models was evaluated using accuracy, precision, and recall scores. In addition, interpretable ML techniques like sensitivity analysis and Shapley values were conducted to assess the most influential factors' impacts on young driver-related crashes. The results revealed that XGBoost performed better than Random Forest, CatBoost, and LightGBM models in crash severity prediction. Results from the sensitivity analysis showed that multi-vehicle crashes, angular crashes, crashes at intersections, and dark-not-lit conditions had increased crash severity. A partial dependence plot of SHAP values revealed that speeding in clear weather had a higher likelihood of injury crashes, and multi-vehicle crashes at the intersection had more injury crashes. We expect that the results obtained from this study will help policymakers and practitioners take appropriate countermeasures to improve the safety of young drivers in New Jersey.