Applied Sciences (Dec 2022)

Entity Recognition for Chinese Hazardous Chemical Accident Data Based on Rules and a Pre-Trained Model

  • Hui Dai,
  • Mu Zhu,
  • Guan Yuan,
  • Yaowei Niu,
  • Hongxing Shi,
  • Boxuan Chen

DOI
https://doi.org/10.3390/app13010375
Journal volume & issue
Vol. 13, no. 1
p. 375

Abstract

Read online

Due to the fragile physicochemical properties of hazardous chemicals, the chances of leakage and explosion during production, transportation, and storage are quite high. In recent years, hazardous chemical accidents have occurred frequently, posing a great threat to people’s lives and property. Hence, it is crucial to analyze hazardous chemical accidents and establish corresponding warning mechanisms and safeguard measures. At present, most hazardous-chemical-accident data exist in text format. However, named entity recognition (NER), as a method to extract useful information from text data, has not been fully utilized in the field of Chinese hazardous-chemical handling. The challenge is that Chinese NER is more difficult than English NER, because the boundaries of Chinese are fuzzy. In addition, the descriptions of hazardous chemical accidents are colloquial and lacks relevant labeling data. Further, most current models do not consider identifying the entities related to accident scenarios, losses, and causes. To tackle these issues, we propose a model based on a rule template and Bert-BiLSTM-CRF (RT-BBC) to recognize named entities from unstructured Chinese hazardous chemical accident reports. Comprehensive experiments on real-world datasets show the effectiveness of the proposed method. Specifically, RT-BBC outperformed the most competitive method by 6.6% and 3.6% in terms of accuracy and F1.

Keywords