Exploring Large Language Models’ Ability to Describe Entity-Relationship Schema-Based Conceptual Data Models

Andrea Avignone; Alessia Tierno; Alessandro Fiori; Silvia Chiusano

doi:10.3390/info16050368

Information (Apr 2025)

Exploring Large Language Models’ Ability to Describe Entity-Relationship Schema-Based Conceptual Data Models

Andrea Avignone,
Alessia Tierno,
Alessandro Fiori,
Silvia Chiusano

Affiliations

Andrea Avignone: Department of Control and Computer Engineering, Politecnico di Torino, 10129 Torino, Italy
Alessia Tierno: Department of Control and Computer Engineering, Politecnico di Torino, 10129 Torino, Italy
Alessandro Fiori: Department of Control and Computer Engineering, Politecnico di Torino, 10129 Torino, Italy
Silvia Chiusano: Department of Control and Computer Engineering, Politecnico di Torino, 10129 Torino, Italy

DOI: https://doi.org/10.3390/info16050368
Journal volume & issue: Vol. 16, no. 5
p. 368

Abstract

Read online

In the field of databases, Large Language Models (LLMs) have recently been studied for generating SQL queries from textual descriptions, while their use for conceptual or logical data modeling remains less explored. The conceptual design of relational databases commonly relies on the entity-relationship (ER) data model, where translation rules enable mapping an ER schema into corresponding relational tables with their constraints. Our study investigates the capability of LLMs to describe in natural language a database conceptual data model based on the ER schema. Whether for documentation, onboarding, or communication with non-technical stakeholders, LLMs can significantly improve the process of explaining the ER schema by generating accurate descriptions about how the components interact as well as the represented information. To guide the LLM with challenging constructs, specific hints are defined to provide an enriched ER schema. Different LLMs have been explored (ChatGPT 3.5 and 4, Llama2, Gemini, Mistral 7B) and different metrics (F1 score, ROUGE, perplexity) are used to assess the quality of the generated descriptions and compare the different LLMs.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords