MapReduce-Based Parallel Algorithms for Multidimensionnal Data Analysis

Jie Pan; Frédéric Magoulès; Yann Le Biannic

doi:10.1260/1748-3018.6.2.325

Journal of Algorithms & Computational Technology (Jun 2012)

MapReduce-Based Parallel Algorithms for Multidimensionnal Data Analysis

Jie Pan,
Frédéric Magoulès,
Yann Le Biannic

Affiliations

Jie Pan: Applied Mathematics and Systems laboratory Ecole Centrale Paris, Grande Voie des Vignes 92295 Châtenay-Malabry Cedex, France
Frédéric Magoulès: Applied Mathematics and Systems laboratory Ecole Centrale Paris, Grande Voie des Vignes 92295 Châtenay-Malabry Cedex, France
Yann Le Biannic: SAP BusinessObjects 157-159, rue Anatole France 92309 Levallois-Perret, Cedex, France

DOI: https://doi.org/10.1260/1748-3018.6.2.325
Journal volume & issue: Vol. 6

Abstract

Read online

MapReduce has excellent scalability and fault-tolerance mechanism. It fits well with the cheap commodity hardware. Today, using MapReduce to answer data analytical query is an attractive topic. In this work, we introduce Multiple Group-by query processing. Our processing of this query is based on MapReduce model, a new parallel computing model coming from Cloud Computing. A pre-processing phase is performed for fitting MapReduce's data accessing and improving data accessibility. We give different MapReduce job definitions in order to process data set partitioned in different partitioning methods. We evaluate our query's processing on top of a cluster of Grid'5000. We also address performance issues since they are very important in software industry to integrate a new technology. We analyze the measured results and discover several factors which impact the response time. At the end of this work, we propose a new data structure which allows more flexible job-scheduling.

Published in Journal of Algorithms & Computational Technology

ISSN: 1748-3018 (Print); 1748-3026 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Applied mathematics. Quantitative methods; Science: Mathematics
Website: https://journals.sagepub.com/home/act

About the journal