IEEE Access (Jan 2023)
Enabling Generative AI to Produce SQL Statements: A Framework for the Auto- Generation of Knowledge Based on EBNF Context-Free Grammars
Abstract
The motivation of this paper is to be able to generate high-quality (Structured Query Language) SQL language sentences in terms of syntax and semantics so that they are intended to achieve a concrete predefined and well-known aim. For example, generating SQL sentences that are capable of detecting a cyber-attack from a set of metrics available in a database table. Two solutions are needed to achieve so, a tool that enables and performs the syntactically valid generation of SQL sentences and an (Artificial intelligence) AI algorithm able to guide the semantics of such generations to the achievement of the best sentences for the intended purpose. The main contribution of this manuscript is the first of these solutions. To be concrete, this paper proposes a tool to enable and generate syntactic-valid language sentences. The tool can deal with any language defined as an ANTLR4 EBNF (Extended Backus-Naur Form) grammar. The paper also provides a methodology to help achieve an EBNF grammar suitable for addressing concerns related to ambiguity and recursion as a direct result of the generation process. The paper further implements a prototype utilizing ANTLR4’s recognizer and its Augmented Transition Network for language generation using EBNF grammars. In-depth design and logic implementation are provided, showcasing areas of interest for AI integration. The achieved prototype showed an ability to easily generate syntactically valid SQL statements at various depths, with observable problems becoming more apparent during the exponential recursive growth. Our mitigation controls for such scenarios proved to be successful and were able to complete the recursion whilst also moving the push-down automata forward until query completion. Experimental validation was performed against a SQL EBNF grammar feeding the generated SQL statement into an SQL parser to validate the syntax.
Keywords