Performance Optimization of Machine Learning Algorithms Based on Spark

Luo Weikang; Zhang Shenglin; Xu Yinggen

doi:10.2478/amns-2024-0416

Applied Mathematics and Nonlinear Sciences (Jan 2024)

Performance Optimization of Machine Learning Algorithms Based on Spark

Luo Weikang,
Zhang Shenglin,
Xu Yinggen

Affiliations

Luo Weikang: 1School of Information Management, Jiangxi University of Finance and Economics, Nanchang, Jiangxi, 330032, China.
Zhang Shenglin: 2College of Software Engineering, Guangxi Normal University, Guilin, Guangxi, 541004, China.
Xu Yinggen: 1School of Information Management, Jiangxi University of Finance and Economics, Nanchang, Jiangxi, 330032, China.

DOI: https://doi.org/10.2478/amns-2024-0416
Journal volume & issue: Vol. 9, no. 1

Abstract

Read online

This paper proposes a performance optimization strategy for Spark-based machine learning algorithms in Shuffle and memory data management modules. The Shuffle module is optimized by introducing Observer monitoring module in Spark cluster to achieve task status monitoring and dynamic ShuffleWrite task generation. Meanwhile, an adaptive caching mechanism for RDD data addresses the lack of in-memory data caching. The performance-optimized algorithm performs well in the experiments, with a clustering accuracy of 89% and a response time that is 5% faster than the Random Forest algorithm. In road network traffic state discrimination, the optimized algorithm’s classification decision F-measure value is as high as 99.53%, which is 5.32% higher than that before unoptimization, and the running time is 767 seconds less than that of the unoptimized algorithm when dealing with about 6,880,000 pieces of data, which significantly improves the efficiency and accuracy.

Published in Applied Mathematics and Nonlinear Sciences

ISSN: 2444-8656 (Online)
Publisher: Sciendo
Country of publisher: Poland
LCC subjects: Science: Mathematics
Website: https://sciendo.com/journal/AMNS

About the journal

Abstract

Keywords