Redefining text-to-SQL metrics by incorporating semantic and structural similarity

Giovanni Pinna; Yuriy Perezhohin; Luca Manzoni; Mauro Castelli; Andrea De Lorenzo

doi:10.1038/s41598-025-04890-9

Scientific Reports (Jul 2025)

Redefining text-to-SQL metrics by incorporating semantic and structural similarity

Giovanni Pinna,
Yuriy Perezhohin,
Luca Manzoni,
Mauro Castelli,
Andrea De Lorenzo

Affiliations

Giovanni Pinna: University of Trieste
Yuriy Perezhohin: NOVA Information Management School (NOVA IMS), Universidade NOVA de Lisboa
Luca Manzoni: University of Trieste
Mauro Castelli: NOVA Information Management School (NOVA IMS), Universidade NOVA de Lisboa
Andrea De Lorenzo: University of Trieste

DOI: https://doi.org/10.1038/s41598-025-04890-9
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 17

Abstract

Read online

Abstract The rapid advancements in text-to-SQL systems have driven the scientific community to create increasingly complex benchmarks for this task. However, evaluation metrics often rely on simplistic or binary approaches that fail to capture the similarities and differences between equivalent SQL queries. Current metrics overlook critical aspects such as partial correctness, structural differences, and semantic equivalence. To address these limitations, we propose a novel metric for SQL query comparison, designed to offer a more precise assessment of the similarity between SQL queries at both the semantic (string) and execution result (resultant table) levels. This new metric allows for a granular evaluation of SQL query similarity, supporting a more accurate assessment and ranking of text-to-SQL tools and models. The proposed approach could have a meaningful impact on text-to-SQL research and development. It might improve evaluation by distinguishing between models that handle simple queries and those capable of tackling more complex ones. The metric could also help to identify where the differences between two queries lie. Additionally, it may support the development of more accurate language models by offering precise training signals to help the model recognize query similarities. The experimental results highlight the metric’s effectiveness over existing evaluation methodologies, allowing us to identify the current best text-to-SQL models through distribution analysis. In some cases, the metric allows the detection of missing aggregation operators or variations in query ordering operators.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords