Journal of Innovation Information Technology and Application (Dec 2023)
Big Data Architectures and Concepts
Abstract
Nowadays, the processing of big data has become a major preoccupation for businesses, not only for storage and processing but also for operational requirements such as speed, maintaining performance with scalability, reliability, availability, security, and cost control; ultimately enabling them to maximize their profits by using the new possibilities offered by Big Data. In this article, we will explore and exploit the concepts and architectures of Big Data, in particular through the Hadoop open-source framework, and see how it meets the needs set out above, in its cluster structure, its components, its Lambda and Kappa architectures, and so on. We are also going to deploy Hadoop in a virtualized Linux environment, with several nodes, under the Oracle Virtual Box virtualization software, and use the experimental method to compare the processing time of the MapReduce algorithm on two DataSets with successively one, two, and three and four Datanodes, and thus observe the gains in processing time with the increase in the number of nodes in the cluster
Keywords