Performance Optimization System for Hadoop and Spark Frameworks

Astsatryan Hrachya; Kocharyan Aram; Hagimont Daniel; Lalayan Arthur

doi:10.2478/cait-2020-0056

Cybernetics and Information Technologies (Dec 2020)

Performance Optimization System for Hadoop and Spark Frameworks

Astsatryan Hrachya,
Kocharyan Aram,
Hagimont Daniel,
Lalayan Arthur

Affiliations

Astsatryan Hrachya: Institute for Informatics and Automation Problems of the National Academy of Sciences of the Republic of Armenia, Yerevan0014, Armenia
Kocharyan Aram: Université Fédérale Toulouse Midi-Pyrénées, Toulouse Cedex 7, France
Hagimont Daniel: Université Fédérale Toulouse Midi-Pyrénées, Toulouse Cedex 7, France
Lalayan Arthur: National Polytechnic University of Armenia, Yerevan0009, Armenia

DOI: https://doi.org/10.2478/cait-2020-0056
Journal volume & issue: Vol. 20, no. 6
pp. 5 – 17

Abstract

Read online

The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.

Published in Cybernetics and Information Technologies

ISSN: 1314-4081 (Online)
Publisher: Sciendo
Country of publisher: Poland
LCC subjects: Science: Science (General): Cybernetics
Website: https://sciendo.com/journal/cait

About the journal

Abstract

Keywords