Geo-spatial Information Science (May 2024)

Beyond extraction accuracy: addressing the quality of geographical named entity through advanced recognition and correction models using a modified BERT framework

  • Liuchang Xu,
  • Jiajun Zhang,
  • Chengkun Zhang,
  • Xinyu Zheng,
  • Zhenhong Du,
  • Xingyu Xue

DOI
https://doi.org/10.1080/10095020.2024.2354229

Abstract

Read online

In the realm of geospatial services and applications, the accuracy of address information is of utmost importance. Traditional methods of data collection, being both labor-intensive and costly, have prompted researchers to turn to Volunteered Geographic Information (VGI) for the extraction of Geographical Named Entity (GNE).Notwithstanding, prior studies have predominantly concentrated on enhancing extraction accuracy, while often overlooking the critical aspect of GNE quality. This study addresses this gap by employing a multifaceted approach. Initially, a Geographical Named Entity Semantic Model (GNESM) was constructed by improving the BERT framework and conducting ablation experiments on multiple influencing factors to verify its feasibility. Based on GNESM, a Geographical Named Entity Recognition Model (GNERM) was constructed by incremental pre-training with social media text data and fine-tuning to achieve a recognition accuracy of 90.9%. Subsequently, a Geographical Named Entity Error Correction Model (GNEECM) was constructed by training GNESM with standard GNE data and incorporating error detection and correction modules, achieving a remarkable accuracy of 96.6% in error detection and correction tasks. The experimental results convincingly demonstrate that the proposed identification and correction methods outperform all compared methods. Through the identification and correction process, this study successfully obtained high-quality GNE data, providing a reference for expanding standard address libraries and subsequent research on geographic named entity.

Keywords