Combining computational linguistics with sentence embedding to create a zero-shot NLIDB
Yuriy Perezhohin,
Fernando Peres,
Mauro Castelli
Affiliations
Yuriy Perezhohin
MyNorth AI Research, Alameda Bonifácio Lázaro Lozano n°15- 1°C, 2780-125, Oeiras, Lisboa, Portugal; NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Campus de Campolide, 1070-312, Lisboa, Portugal
Fernando Peres
MyNorth AI Research, Alameda Bonifácio Lázaro Lozano n°15- 1°C, 2780-125, Oeiras, Lisboa, Portugal; NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Campus de Campolide, 1070-312, Lisboa, Portugal
Mauro Castelli
NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Campus de Campolide, 1070-312, Lisboa, Portugal; Corresponding author.
Accessing relational databases using natural language is a challenging task, with existing methods often suffering from poor domain generalization and high computational costs. In this study, we propose a novel approach that eliminates the training phase while offering high adaptability across domains. Our method combines structured linguistic rules, a curated vocabulary, and pre-trained embedding models to accurately translate natural language queries into SQL. Experimental results on the SPIDER benchmark demonstrate the effectiveness of our approach, with execution accuracy rates of 72.03% on the training set and 70.83% on the development set, while maintaining domain flexibility. Furthermore, the proposed system outperformed two extensively trained models by up to 28.33% on the development set, demonstrating its efficiency. This research presents a significant advancement in zero-shot Natural Language Interfaces for Databases (NLIDBs), providing a resource-efficient alternative for generating accurate SQL queries from plain language inputs.