Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures

Jinbae Lee; Bobae Kim; Jong-Moon Chung

doi:10.1109/ACCESS.2019.2891001

IEEE Access (Jan 2019)

Time Estimation and Resource Minimization Scheme for Apache Spark and Hadoop Big Data Systems With Failures

Jinbae Lee,
Bobae Kim,
Jong-Moon Chung

Affiliations

Jinbae Lee: School of Electrical and Electronic Engineering, Yonsei University, Seoul, South Korea
Bobae Kim: School of Electrical and Electronic Engineering, Yonsei University, Seoul, South Korea
Jong-Moon Chung: ORCiD; School of Electrical and Electronic Engineering, Yonsei University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2019.2891001
Journal volume & issue: Vol. 7
pp. 9658 – 9666

Abstract

Read online

Apache Spark and Hadoop are open source frameworks for big data processing, which have been adopted by many companies. In order to implement a reliable big data system that can satisfy processing target completion times, accurate resource provisioning and job execution time estimations are needed. In this paper, time estimation and resource minimization schemes for Spark and Hadoop systems are presented. The proposed models use the probability of failure in the estimations to more accurately formulate the characteristics of real big data operations. The experimental results show that the proposed Spark adaptive failure-compensation and Hadoop adaptive failure-compensation schemes improve the accuracy of resource provisions by considering failure events, which improves the scheduling success rate of big data processing tasks.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords