Training Back Propagation Neural Networks in MapReduce on High-Dimensional Big Datasets With Global Evolution

Wanghu Chen; Jing Li; Xintian Li; Lizhi Zhang; Jianwu Wang

doi:10.1109/access.2019.2951189

IEEE Access (Jan 2019)

Training Back Propagation Neural Networks in MapReduce on High-Dimensional Big Datasets With Global Evolution

Wanghu Chen,
Jing Li,
Xintian Li,
Lizhi Zhang,
Jianwu Wang

Affiliations

Wanghu Chen: ORCiD; College of Computer Science and Engineering, Northwest Normal University, Lanzhou, China
Jing Li: College of Computer Science and Engineering, Northwest Normal University, Lanzhou, China
Xintian Li: College of Computer Science and Engineering, Northwest Normal University, Lanzhou, China
Lizhi Zhang: College of Computer Science and Engineering, Northwest Normal University, Lanzhou, China
Jianwu Wang: Department of Information Systems, University of Maryland, Baltimore County, Baltimore, MD, USA

DOI: https://doi.org/10.1109/access.2019.2951189
Journal volume & issue: Vol. 7
pp. 159855 – 159867

Abstract

Read online

Owing to its scalability and high fault-tolerance even on a distributed environment built up with personal computers, MapReduce has been introduced to parallelise the training of Back Propagation Neural Networks (BPNNs) on high-dimensional big datasets. Based on the evolution of local BPNNs produced by distributed Map tasks with different data splits, the paper proposes a novel approach to the distributed data-parallel training of BPNNs in MapReduce. The approach provides a reasonable measure to get global convergent BPNN candidates from local BPNNs only convergent on the specific data splits. Further, it not only can reduce the iterations to get the global convergent BPNN, but also shows great advantages in avoiding the training to get trapped into a local optimum on high-dimensional big datasets. To improve the training efficiency further, local BPNNs from the same computing node are merged based on the average of their weight matrices before they act as individuals of the population for the global evolution. Our approach also leverages Random Project based sampling techniques to evaluate the fitness of each individual in order to lower the computation cost in the evolution stage. Experiments show that our proposed approach improves the training efficiency highly compared to the stand-alone or traditional MapReduce BPNN training, and improves model accuracy for larger datasets. The comparison with 23 other popular classification approaches also shows that our proposed approach has big advantages in accuracy.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords