A deep neural network model for Chinese toponym matching with geographic pre-training model

Qinjun Qiu; Shiyu Zheng; Miao Tian; Jiali Li; Kai Ma; Liufeng Tao; Zhong Xie

doi:10.1080/17538947.2024.2353111

International Journal of Digital Earth (Dec 2024)

A deep neural network model for Chinese toponym matching with geographic pre-training model

Qinjun Qiu,
Shiyu Zheng,
Miao Tian,
Jiali Li,
Kai Ma,
Liufeng Tao,
Zhong Xie

Affiliations

Qinjun Qiu: Key Laboratory of Geological Survey and Evaluation of Ministry of Education, China University of Geosciences, Wuhan, People’s Republic of China
Shiyu Zheng: School of Computer Science, China University of Geosciences, Wuhan, People’s Republic of China
Miao Tian: Key Laboratory of Geological Survey and Evaluation of Ministry of Education, China University of Geosciences, Wuhan, People’s Republic of China
Jiali Li: School of Computer Science, China University of Geosciences, Wuhan, People’s Republic of China
Kai Ma: Hubei Key Laboratory of Intelligent Vision Based Monitoring for Hydroelectric Engineering, China Three Gorges University, Yichang, People’s Republic of China
Liufeng Tao: School of Computer Science, China University of Geosciences, Wuhan, People’s Republic of China
Zhong Xie: School of Computer Science, China University of Geosciences, Wuhan, People’s Republic of China

DOI: https://doi.org/10.1080/17538947.2024.2353111
Journal volume & issue: Vol. 17, no. 1

Abstract

Read online

ABSTRACTMultiple tasks within the field of geographical information retrieval and geographical information sciences necessitate toponym matching, which involves the challenge of aligning toponyms that share a common referent. The multiple string similarity approaches struggle when confronted with the complexities associated with unofficial and/or historical variants of identical toponyms. Also, current state-of-the-art approaches/tools to supervised machine learning rely on labeled samples, and they do not adequately address the intricacies of character replacements either from transliterations or historical shifts in linguistic and cultural norms. To address these issues, this paper proposes a novel matching approach that leverages a deep neural network model empowered by geographic language representation model, known as GeoBERT, which stands for geographic Bidirectional Encoder Representations from Transformers (BERT). This model harnesses the groundbreaking capabilities of the GeoBERT framework by extending a generalized Enhanced Sequential Inference Model architecture and integrating multiple features to enhance the accuracy and robustness of the toponym matching. We present a comprehensive evaluation of the proposed method’s performance using three extensive datasets. The findings clearly illustrate that our approach outperforms the individual similarity metrics used in previous studies.

Published in International Journal of Digital Earth

ISSN: 1753-8947 (Print); 1753-8955 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Geography. Anthropology. Recreation: Mathematical geography. Cartography
Website: https://www.tandfonline.com/journals/tjde

About the journal

Abstract

Keywords