Enhancing SPARQL Query Generation for Knowledge Base Question Answering Systems by Learning to Correct Triplets

Jiexing Qi; Chang Su; Zhixin Guo; Lyuwen Wu; Zanwei Shen; Luoyi Fu; Xinbing Wang; Chenghu Zhou

doi:10.3390/app14041521

Applied Sciences (Feb 2024)

Enhancing SPARQL Query Generation for Knowledge Base Question Answering Systems by Learning to Correct Triplets

Jiexing Qi,
Chang Su,
Zhixin Guo,
Lyuwen Wu,
Zanwei Shen,
Luoyi Fu,
Xinbing Wang,
Chenghu Zhou

Affiliations

Jiexing Qi: School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Chang Su: School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Zhixin Guo: School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Lyuwen Wu: School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Zanwei Shen: School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Luoyi Fu: School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Xinbing Wang: School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Chenghu Zhou: School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China

DOI: https://doi.org/10.3390/app14041521
Journal volume & issue: Vol. 14, no. 4
p. 1521

Abstract

Read online

Generating SPARQL queries from natural language questions is challenging in Knowledge Base Question Answering (KBQA) systems. The current state-of-the-art models heavily rely on fine-tuning pretrained models such as T5. However, these methods still encounter critical issues such as triple-flip errors (e.g., (subject, relation, object) is predicted as (object, relation, subject)). To address this limitation, we introduce TSET (Triplet Structure Enhanced T5), a model with a novel pretraining stage positioned between the initial T5 pretraining and the fine-tuning for the Text-to-SPARQL task. In this intermediary stage, we introduce a new objective called Triplet Structure Correction (TSC) to train the model on a SPARQL corpus derived from Wikidata. This objective aims to deepen the model’s understanding of the order of triplets. After this specialized pretraining, the model undergoes fine-tuning for SPARQL query generation, augmenting its query-generation capabilities. We also propose a method named “semantic transformation” to fortify the model’s grasp of SPARQL syntax and semantics without compromising the pre-trained weights of T5. Experimental results demonstrate that our proposed TSET outperforms existing methods on three well-established KBQA datasets: LC-QuAD 2.0, QALD-9 plus, and QALD-10, establishing a new state-of-the-art performance (95.0% F1 and 93.1% QM on LC-QuAD 2.0, 75.85% F1 and 61.76% QM on QALD-9 plus, 51.37% F1 and 40.05% QM on QALD-10).

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords