Large-scale distributed L-BFGS

Maryam M. Najafabadi; Taghi M. Khoshgoftaar; Flavio Villanustre; John Holt

doi:10.1186/s40537-017-0084-5

Journal of Big Data (Jul 2017)

Large-scale distributed L-BFGS

Maryam M. Najafabadi,
Taghi M. Khoshgoftaar,
Flavio Villanustre,
John Holt

Affiliations

Maryam M. Najafabadi: Florida Atlantic University
Taghi M. Khoshgoftaar: Florida Atlantic University
Flavio Villanustre: LexisNexis Business Information Solutions
John Holt: LexisNexis Business Information Solutions

DOI: https://doi.org/10.1186/s40537-017-0084-5
Journal volume & issue: Vol. 4, no. 1
pp. 1 – 17

Abstract

Read online

Abstract With the increasing demand for examining and extracting patterns from massive amounts of data, it is critical to be able to train large models to fulfill the needs that recent advances in the machine learning area create. L-BFGS (Limited-memory Broyden Fletcher Goldfarb Shanno) is a numeric optimization method that has been effectively used for parameter estimation to train various machine learning models. As the number of parameters increase, implementing this algorithm on one single machine can be insufficient, due to the limited number of computational resources available. In this paper, we present a parallelized implementation of the L-BFGS algorithm on a distributed system which includes a cluster of commodity computing machines. We use open source HPCC Systems (High-Performance Computing Cluster) platform as the underlying distributed system to implement the L-BFGS algorithm. We initially provide an overview of the HPCC Systems framework and how it allows for the parallel and distributed computations important for Big Data analytics and, subsequently, we explain our implementation of the L-BFGS algorithm on this platform. Our experimental results show that our large-scale implementation of the L-BFGS algorithm can easily scale from training models with millions of parameters to models with billions of parameters by simply increasing the number of commodity computational nodes.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords