An efficient algorithm for data parallelism based on stochastic optimization

Khalid Abdulaziz Alnowibet; Imran Khan; Karam M. Sallam; Ali Wagdy Mohamed

Alexandria Engineering Journal (Dec 2022)

An efficient algorithm for data parallelism based on stochastic optimization

Khalid Abdulaziz Alnowibet,
Imran Khan,
Karam M. Sallam,
Ali Wagdy Mohamed

Affiliations

Khalid Abdulaziz Alnowibet: Statistics and Operations Research Department, College of Science, King Saud University, PO Box 2455, Riyadh 11451, Kingdom of Saudi Arabia
Imran Khan: Department of Electrical Engineering, University of Engineering & Technology, Peshawar 814, Pakistan
Karam M. Sallam: School of IT and Systems, University of Canberra, ACT 2601, Australia
Ali Wagdy Mohamed: Operations Research Department, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza 12613, Egypt; Department of Mathematics and Actuarial Science, The American University in Cairo, New Cairo, Egypt; Corresponding author at: Department of Mathematics and Actuarial Science, The American University in Cairo, New Cairo, Egypt.

Journal volume & issue: Vol. 61, no. 12
pp. 12005 – 12017

Abstract

Read online

Deep neural network models can achieve greater performance in numerous machine learning tasks by raising the depth of the model and the amount of training data samples. However, these essential procedures will proportionally raise the cost of training deep neural network models. Accelerating the training process of deep neural network models in a distributed computing environment has become the most often utilized strategy for developers in order to better cope with a huge quantity of training overhead. The current deep neural network model is the stochastic gradient descent (SGD) technique. It is one of the most widely used training techniques in network models, although it is prone to gradient obsolescence during parallelization, which impacts the overall convergence. The majority of present solutions are geared at high-performance nodes with minor performance changes. Few studies have taken into account the cluster environment in high-performance computing (HPC), where the performance of each node varies substantially. A dynamic batch size stochastic gradient descent approach based on performance-aware technology is suggested to address the aforesaid difficulties (DBS-SGD). By assessing the processing capacity of each node, this method dynamically allocates the minibatch of each node, guaranteeing that the update time of each iteration between nodes is essentially the same, lowering the average gradient of the node. The suggested approach may successfully solve the asynchronous update strategy’s gradient outdated problem. The Mnist and cifar10 are two widely used image classification benchmarks, that are employed as training data sets, and the approach is compared with the asynchronous stochastic gradient descent (ASGD) technique. The experimental findings demonstrate that the proposed algorithm has better performance as compared with existing algorithms.

Published in Alexandria Engineering Journal

ISSN: 1110-0168 (Print); 2090-2670 (Online)
Publisher: Elsevier
Country of publisher: Egypt
LCC subjects: Technology: Engineering (General). Civil engineering (General)
Website: http://www.journals.elsevier.com/alexandria-engineering-journal/

About the journal

Abstract

Keywords