IEEE Access (Jan 2021)
Entity Extraction of Electrical Equipment Malfunction Text by a Hybrid Natural Language Processing Algorithm
Abstract
Many electrical equipment malfunction text messages are collected during power system operation and maintenance procedures. These texts usually contain crucial information for maintenance and condition monitoring. Because these power system malfunction texts are characterized by multidomain vocabularies, complex-syntactic structures, and long sentences, it is challenging to for automated systems to capture their semantic meaning and essential information. To address this issue, we propose a hybrid natural language processing (hybrid-NLP) algorithm to extract entities that represent electrical equipment. This algorithm is composed of a dictionary-based method, a language technology platform (LTP) tool, and the bidirectional encoder representations from a transformers-conditional random field (BERT-CRF) model. Significantly, the softmax output layer of the bidirectional encoder representations from the transformers (BERT) model is replaced by the conditional random field (CRF) algorithm to strengthen the contextual relationships between words and thus solve the local optimization of the preferred word label. The effectiveness of the proposed hybrid-NLP method is verified on a realistic dataset. Moreover, a statistical analysis is conducted to provide a reference for the operation and maintenance of power systems.
Keywords