Ordinal Optimization-Based Performance Model Estimation Method for HDFS

Tian Ma; Feng Tian; Bo Dong

doi:10.1109/ACCESS.2019.2962724

IEEE Access (Jan 2020)

Ordinal Optimization-Based Performance Model Estimation Method for HDFS

Tian Ma,
Feng Tian,
Bo Dong

Affiliations

Tian Ma: ORCiD; Department of Automation Science and Technology, Xi’an Jiaotong University, Xi’an, China
Feng Tian: ORCiD; Department of Automation Science and Technology, Xi’an Jiaotong University, Xi’an, China
Bo Dong: ORCiD; National Engineering Laboratory for Big Data Analytics, Xi’an Jiaotong University, Xi’an, China

DOI: https://doi.org/10.1109/ACCESS.2019.2962724
Journal volume & issue: Vol. 8
pp. 889 – 899

Abstract

Read online

Modeling and analyzing the performance of distributed file systems (DFSs) benefit the reliability and quality of data processing in data-intensive applications. Hadoop Distributed File System (HDFS) is a typical representative of DFSs. Its internal heterogeneity and complexity as well as external disturbance contribute to HDFS's built-in features of nonlinearity as well as randomness in system level, which raises a great challenge in modeling these features. Particularly, the randomness results in the uncertainty of HDFS performance model. Due to the complex mathematical structure and parameters hardly estimated of analytical models, it is highly complicated and computationally impossible to build an explicit and precise analytical model of the randomness. The measurement-based methodology is a promising way to model HDFS performance in terms of randomness since it requires no knowledge of system's internal behaviors. In this paper, the estimation of HDFS performance models on account of the randomness is transformed to an optimization problem of finding out the real best design of performance model structure with large design space. Core ideas of ordinal optimization (OO) are introduced to solve this problem with a limited computing budget. Piecewise linear (PL) model is applied to approximate the nonlinear characteristics and randomness of HDFS performance. The experimental results show that the proposed method is effective and practical to estimate the optimal design of the PL-based performance model structure for HDFS. It not only provides a globally consistent evaluation of the design space but also guarantees the goodness of the solution with high probability. Moreover, it improves the accuracy of system model-based HDFS performance models.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords