Challenges of the Knowledge Society (Jul 2015)

LINEAR REGRESSION WITH R AND HADOOP

  • Bogdan OANCEA

Journal volume & issue
Vol. 5, no. 1
pp. 1007 – 1012

Abstract

Read online

In this paper we present a way to solve the linear regression model with R and Hadoop using the Rhadoop library. We show how the linear regression model can be solved even for very large models that require special technologies. For storing the data we used Hadoop and for computation we used R. The interface between R and Hadoop is the open source library RHadoop. We present the main features of the Hadoop and R software systems and the way of interconnecting them. We then show how the least squares solution for the linear regression problem could be expressed in terms of map-reduce programming paradigm and how could be implemented using the Rhadoop library.

Keywords