Distributed Nonparametric and Semiparametric Regression on SPARK for Big Data Forecasting

Jelena Fiosina; Maksims Fiosins

doi:10.1155/2017/5134962

Applied Computational Intelligence and Soft Computing (Jan 2017)

Distributed Nonparametric and Semiparametric Regression on SPARK for Big Data Forecasting

Jelena Fiosina,
Maksims Fiosins

Affiliations

Jelena Fiosina: Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Maksims Fiosins: Clausthal University of Technology, Clausthal-Zellerfeld, Germany

DOI: https://doi.org/10.1155/2017/5134962
Journal volume & issue: Vol. 2017

Abstract

Read online

Forecasting in big datasets is a common but complicated task, which cannot be executed using the well-known parametric linear regression. However, nonparametric and semiparametric methods, which enable forecasting by building nonlinear data models, are computationally intensive and lack sufficient scalability to cope with big datasets to extract successful results in a reasonable time. We present distributed parallel versions of some nonparametric and semiparametric regression models. We used MapReduce paradigm and describe the algorithms in terms of SPARK data structures to parallelize the calculations. The forecasting accuracy of the proposed algorithms is compared with the linear regression model, which is the only forecasting model currently having parallel distributed realization within the SPARK framework to address big data problems. The advantages of the parallelization of the algorithm are also provided. We validate our models conducting various numerical experiments: evaluating the goodness of fit, analyzing how increasing dataset size influences time consumption, and analyzing time consumption by varying the degree of parallelism (number of workers) in the distributed realization.

Published in Applied Computational Intelligence and Soft Computing

ISSN: 1687-9724 (Print); 1687-9732 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://onlinelibrary.wiley.com/journal/4795

About the journal