MATEC Web of Conferences (Jan 2016)

Performance Evaluation of Hadoop-based Large-scale Network Traffic Analysis Cluster

  • Tao Ran,
  • Qiao Yuanyuan,
  • Zhou Wenli

DOI
https://doi.org/10.1051/matecconf/20165605015
Journal volume & issue
Vol. 56
p. 05015

Abstract

Read online

As Hadoop has gained popularity in big data era, it is widely used in various fields. The self-design and self-developed large-scale network traffic analysis cluster works well based on Hadoop, with off-line applications running on it to analyze the massive network traffic data. On purpose of scientifically and reasonably evaluating the performance of analysis cluster, we propose a performance evaluation system. Firstly, we set the execution times of three benchmark applications as the benchmark of the performance, and pick 40 metrics of customized statistical resource data. Then we identify the relationship between the resource data and the execution times by a statistic modeling analysis approach, which is composed of principal component analysis and multiple linear regression. After training models by historical data, we can predict the execution times by current resource data. Finally, we evaluate the performance of analysis cluster by the validated predicting of execution times. Experimental results show that the predicted execution times by trained models are within acceptable error range, and the evaluation results of performance are accurate and reliable.