Scientific Reports (Jul 2025)

Redefining text-to-SQL metrics by incorporating semantic and structural similarity

  • Giovanni Pinna,
  • Yuriy Perezhohin,
  • Luca Manzoni,
  • Mauro Castelli,
  • Andrea De Lorenzo

DOI
https://doi.org/10.1038/s41598-025-04890-9
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 17

Abstract

Read online

Abstract The rapid advancements in text-to-SQL systems have driven the scientific community to create increasingly complex benchmarks for this task. However, evaluation metrics often rely on simplistic or binary approaches that fail to capture the similarities and differences between equivalent SQL queries. Current metrics overlook critical aspects such as partial correctness, structural differences, and semantic equivalence. To address these limitations, we propose a novel metric for SQL query comparison, designed to offer a more precise assessment of the similarity between SQL queries at both the semantic (string) and execution result (resultant table) levels. This new metric allows for a granular evaluation of SQL query similarity, supporting a more accurate assessment and ranking of text-to-SQL tools and models. The proposed approach could have a meaningful impact on text-to-SQL research and development. It might improve evaluation by distinguishing between models that handle simple queries and those capable of tackling more complex ones. The metric could also help to identify where the differences between two queries lie. Additionally, it may support the development of more accurate language models by offering precise training signals to help the model recognize query similarities. The experimental results highlight the metric’s effectiveness over existing evaluation methodologies, allowing us to identify the current best text-to-SQL models through distribution analysis. In some cases, the metric allows the detection of missing aggregation operators or variations in query ordering operators.

Keywords