Jordanian Journal of Computers and Information Technology (Apr 2024)

A Machine Learning Based Decision Support Framework For Big Data Pipeline Modeling And Design

  • Asma Dhaouadi,
  • Khadija Bousselmi,
  • Sebastien Monnet,
  • Mohamed Mohsen Gammoudi,
  • Slimane Hammoudi

DOI
https://doi.org/10.5455/jjcit.71-1711356163
Journal volume & issue
Vol. 10, no. 3
pp. 306 – 318

Abstract

Read online

The data warehousing process requires an architectural revolution to settle big data challenges and address new data sources such as social networks, recommendation systems, smart cities, and the web to extract value from shared data. In this respect, the pipeline modeling community for the acquisition, storage, and processing of data for analysis purposes is enacting a wide range of technological solutions that present significant challenges and difficulties. More specifically, the choice of the most appropriate tool for the user's specific business needs and the interoperability between the different tools have become a primary challenges. From this perspective, we propose in this paper a new interactive framework based on machine learning techniques (ML) to assist experts in the process of modeling a customized pipeline for data warehousing. More precisely, we elaborate first (i) an analysis of the experts' requirements and the characteristics of the data to be processed, then (ii) we propose the most appropriate architecture to their requirements from a multitude of specific architectures instantiated from a generic one, by using (iii) several ML methods to predict the most suitable tool for each phase and task within the architecture. Additionally, our framework is validated through two real-world use cases and user feedback. [JJCIT 2024; 10(3.000): 306-318]

Keywords