Analysis of Severe Injuries in Crashes Involving Large Trucks Using K-Prototypes Clustering-Based GBDT Model

Syed As-Sadeq Tahfim; Chen Yan

doi:10.3390/safety7020032

Safety (Apr 2021)

Analysis of Severe Injuries in Crashes Involving Large Trucks Using K-Prototypes Clustering-Based GBDT Model

Syed As-Sadeq Tahfim,
Chen Yan

Affiliations

Syed As-Sadeq Tahfim: Department of Maritime Economics and Management, Dalian Maritime University, Linghai Road, Dalian 116026, China
Chen Yan: Department of Maritime Economics and Management, Dalian Maritime University, Linghai Road, Dalian 116026, China

DOI: https://doi.org/10.3390/safety7020032
Journal volume & issue: Vol. 7, no. 2
p. 32

Abstract

Read online

The unobserved heterogeneity in traffic crash data hides certain relationships between the contributory factors and injury severity. The literature has been limited in exploring different types of clustering methods for the analysis of the injury severity in crashes involving large trucks. Additionally, the variability of data type in traffic crash data has rarely been addressed. This study explored the application of the k-prototypes clustering method to countermeasure the unobserved heterogeneity in large truck-involved crashes that had occurred in the United States between the period of 2016 to 2019. The study segmented the entire dataset (EDS) into three homogeneous clusters. Four gradient boosted decision trees (GBDT) models were developed on the EDS and individual clusters to predict the injury severity in crashes involving large trucks. The list of input features included crash characteristics, truck characteristics, roadway attributes, time and location of the crash, and environmental factors. Each cluster-based GBDT model was compared with the EDS-based model. Two of the three cluster-based models showed significant improvement in their predicting performances. Additionally, feature analysis using the SHAP (Shapley additive explanations) method identified few new important features in each cluster and showed that some features have a different degree of effects on severe injuries in the individual clusters. The current study concluded that the k-prototypes clustering-based GBDT model is a promising approach to reveal hidden insights, which can be used to improve safety measures, roadway conditions and policies for the prevention of severe injuries in crashes involving large trucks.

Published in Safety

ISSN: 2313-576X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial directories: Industrial safety. Industrial accident prevention; Medicine: Medicine (General)
Website: http://www.mdpi.com/journal/safety

About the journal

Abstract

Keywords