IEEE Access (Jan 2020)
Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures
Abstract
Big Data applications are present in many areas such as financial markets, search engines, stream services, health care, social networks, and so on. Data analysis provides value to information for organizations. Classical Cloud Computing represents a robust architecture to perform complex and large-scale computing for these areas. The main challenges are the user's unknowledge about Cloud infrastructure, the requirement needed for improving performance, and the resource management to maintain stable processing. In these difficulties, an inadequate solution can lead to users overestimate or underestimate the number of computational resources, which drives to the budget increases. One way to work around this problem is to make use of Volunteer Computing since it provides distributed computational resources at free monetary cost. However, a volatile machine behavior is a problem to address in Big Data data distributions. Thus, this work proposes a data distribution model composed of Cloud Computing and Volunteer Computing environments in a hybrid fashion for Big Data analytics. The contributions of this work are: i) the required evaluation to enable efficient deployment of Big Data in hybrid infrastructures; ii) the development of an HR_Alloc Algorithm for establishing the data placement to Big Data applications; iii) a model to resource allocation in hybrid infrastructures. The obtained results indicate the feasibility of using a hybrid infrastructure with up to 35% of unstable machines in the worst-case scenario, without losing performance and a monetary cost lower than 20% in comparison to Classical Cloud Computing. Also, communication costs decrease up to 57.14% in the best-case scenario due to load balancing.
Keywords