IEEE Open Journal of Intelligent Transportation Systems (Jan 2023)
Text Classification Modeling Approach on Imbalanced-Unstructured Traffic Accident Descriptions Data
Abstract
The unstructured-textual crash descriptions recorded by police officers is rarely utilized, despite containing detailed information on traffic situations. This lack of utilization is mainly due to the difficulty in analyzing text data, as there is currently no innovative methodology for extracting meaningful information from it. Given limitations and challenges in analyzing traffic crash descriptions, this study developed a methodology to classify significant words in unstructured data that describe traffic crash scenarios into standardized data. Ultimately, a natural language processing technique, specifically a bidirectional encoder representation from transformer (BERT), was used to extract meaningful information from crash descriptions. This BERT-based model effectively extracts information on the exact collision point and the pre-crash vehicle maneuver from crash descriptions. Its practical approach allows for the interpretation of traffic crash descriptions and outperforms other natural language processing models. Importantly, this method of extracting crash scene information from traffic crash descriptions can aid in better comprehending the unique characteristics of traffic crashes. This comprehension can ultimately aid in the development of appropriate countermeasures, leading to the prevention of future traffic crashes.
Keywords