Adaptive memory reservation strategy for heavy workloads in the Spark environment

Bohan Li; Xin He; Junyang Yu; Guanghui Wang; Yixin Song; Shunjie Pan; Hangyu Gu

doi:10.7717/peerj-cs.2460

PeerJ Computer Science (Nov 2024)

Adaptive memory reservation strategy for heavy workloads in the Spark environment

Bohan Li,
Xin He,
Junyang Yu,
Guanghui Wang,
Yixin Song,
Shunjie Pan,
Hangyu Gu

Affiliations

Bohan Li: School of Software, Henan University, Kaifeng, Henan Province, China
Xin He: School of Software, Henan University, Kaifeng, Henan Province, China
Junyang Yu: School of Software, Henan University, Kaifeng, Henan Province, China
Guanghui Wang: School of Software, Henan University, Kaifeng, Henan Province, China
Yixin Song: School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu Province, China
Shunjie Pan: School of Software, Henan University, Kaifeng, Henan Province, China
Hangyu Gu: School of Software, Henan University, Kaifeng, Henan Province, China

DOI: https://doi.org/10.7717/peerj-cs.2460
Journal volume & issue: Vol. 10
p. e2460

Abstract

Read online Read online

The rise of the Internet of Things (IoT) and Industry 2.0 has spurred a growing need for extensive data computing, and Spark emerged as a promising Big Data platform, attributed to its distributed in-memory computing capabilities. However, practical heavy workloads often lead to memory bottleneck issues in the Spark platform. This results in resilient distributed datasets (RDD) eviction and, in extreme cases, violent memory contentions, causing a significant degradation in Spark computational efficiency. To tackle this issue, we propose an adaptive memory reservation (AMR) strategy in this article, specifically designed for heavy workloads in the Spark environment. Specifically, we model optimal task parallelism by minimizing the disparity between the number of tasks completed without blocking and the number completed in regular rounds. Optimal memory for task parallelism is determined to establish an efficient execution memory space for computational parallelism. Subsequently, through adaptive execution memory reservation and dynamic adjustments, such as compression or expansion based on task progress, the strategy ensures dynamic task parallelism in the Spark parallel computing process. Considering the cost of RDD cache location and real-time memory space usage, we select suitable storage locations for different RDD types to alleviate execution memory pressure. Finally, we conduct extensive laboratory experiments to validate the effectiveness of AMR. Results indicate that, compared to existing memory management solutions, AMR reduces the execution time by approximately 46.8%.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords