Journal of Big Data (Oct 2018)

Evaluation of high-level query languages based on MapReduce in Big Data

  • Marouane Birjali,
  • Abderrahim Beni-Hssane,
  • Mohammed Erritali

DOI
https://doi.org/10.1186/s40537-018-0146-3
Journal volume & issue
Vol. 5, no. 1
pp. 1 – 21

Abstract

Read online

Abstract MapReduce (MR) is a criterion of Big Data processing model with parallel and distributed large datasets. This model knows difficult problems related to low-level and batch nature of MR that gives rise to an abstraction layer on the top of MR. Therefore; several High-Level MapReduce Query Languages built on the top of MR provide more abstract query languages and extend the MR programming model. These High-Level MapReduce Query Languages remove the burden of MR programming away from the developers and make a soft migration of existing competences with SQL skills to Big Data. This paper investigates the very used—common High-Level MapReduce Query Languages built directly on the top of MR that translate queries into executable native MR jobs. It evaluates the performance of the four presented High-Level MapReduce Query Languages: JAQL, Hive, Big SQL and Pig, with regards to their insightful perspectives and ease of programming. The baseline metrics reported are increasing input size, scale-out number of nodes and controlling number of reducers. The experimental results study the technical advantages and limitations of each High-Level MapReduce Query Languages. Finally, the paper provides a summary for developers to choose the High-Level MapReduce Query Languages which fulfill their needs and interests.

Keywords