Applied System Innovation (Sep 2024)
The E(G)TL Model: A Novel Approach for Efficient Data Handling and Extraction in Multivariate Systems
Abstract
This paper introduces the EGTL (extract, generate, transfer, load) model, a theoretical framework designed to enhance the traditional ETL processes by integrating a novel ‘generate’ step utilizing generative artificial intelligence (GenAI). This enhancement optimizes data extraction and processing, presenting a high-level solution architecture that includes innovative data storage concepts: the Fusion and Alliance stores. The Fusion store acts as a virtual space for immediate data cleaning and profiling post-extraction, facilitated by GenAI, while the Alliance store serves as a collaborative data warehouse for both business users and AI processes. EGTL was developed to facilitate advanced data handling and integration within digital ecosystems. This study defines the EGTL solution design, setting the groundwork for future practical implementations and exploring the integration of best practices from data engineering, including DataOps principles and data mesh architecture. This research underscores how EGTL can improve the data engineering pipeline, illustrating the interactions between its components. The EGTL model was tested in the prototype web-based Hyperloop Decision-Making Ecosystem with tasks ranging from data extraction to code generation. Experiments demonstrated an overall success rate of 93% across five difficulty levels. Additionally, the study highlights key risks associated with EGTL implementation and offers comprehensive mitigation strategies.
Keywords