Applied Sciences (Feb 2025)

Construction of a Geological Fault Corpus and Named Entity Recognition

  • Huainuo Wang,
  • Ruiqing Niu,
  • Yongyao Han,
  • Qinglu Deng

DOI
https://doi.org/10.3390/app15052465
Journal volume & issue
Vol. 15, no. 5
p. 2465

Abstract

Read online

The rapid and effective extraction of fault entities is a fundamental process in constructing a fault knowledge graph. As a key method for recording and preserving fault data, a fault investigation report holds significant potential for extracting valuable information. This paper proposes a fault knowledge annotation system that incorporates geographic information, fault attribute, fault structure, fault activity, fault geomorphology, and fault hazard. The system is developed based on a comprehensive analysis of the textual characteristics of fault investigation reports. Additionally, we establish a fine-grained corpus tailored for this task and apply a combination of BERT and BiLSTM-CRF for named entity recognition in the fault domain. We compare the performance of our model with a non-pre-training baseline model. The experimental results demonstrate that (1) the F1 value of entity recognition based on the faulty corpus exceeds 80%, which validates the efficacy of the faulty corpus; (2) the BERT model can effectively utilize available information. The corpus to adjust the subsequent tasks, thus improving the model output; (3) the proposed BERT-BiLSTM-CRF model and ALBERT-BiLSTM-CRF models have superior extraction performance in comparison to the no-pre-training model. This study not only provides a theoretical basis for the effectiveness of the BERT-BiLSTM-CRF model in fault entity identification, but also establishes a solid data foundation for the subsequent construction of the fault knowledge map. In addition, it offers reliable technical support for practical application areas such as geological surveys, disaster early warning, and urban planning, thereby promoting the advancement of data-driven research in the field of geology.

Keywords