Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge Samples

Xin Gao; Sen Zha; Xinpeng Li; Bo Yan; Xiao Jing; Junliang Li; Jianhang Xu

doi:10.1109/ACCESS.2019.2935628

IEEE Access (Jan 2019)

Incremental Prediction Model of Disk Failures Based on the Density Metric of Edge Samples

Xin Gao,
Sen Zha,
Xinpeng Li,
Bo Yan,
Xiao Jing,
Junliang Li,
Jianhang Xu

Affiliations

Xin Gao: ORCiD; School of Automation, Beijing University of Posts and Telecommunications, Beijing, China
Sen Zha: School of Automation, Beijing University of Posts and Telecommunications, Beijing, China
Xinpeng Li: School of Automation, Beijing University of Posts and Telecommunications, Beijing, China
Bo Yan: State Grid Jibei Electric Power Company Limited, Beijing, China
Xiao Jing: School of Automation, Beijing University of Posts and Telecommunications, Beijing, China
Junliang Li: Nari Group Corporation (State Grid Electric Power Research Institute), Beijing, China
Jianhang Xu: Nari Group Corporation (State Grid Electric Power Research Institute), Beijing, China

DOI: https://doi.org/10.1109/ACCESS.2019.2935628
Journal volume & issue: Vol. 7
pp. 114285 – 114296

Abstract

Read online

Disks are the main equipment for data storage in data centers. The prediction of disk failure is of great significance for the reliability and security of data. On account of the few abnormal samples in the disk datasets, it is difficult to satisfy the requirement of supervised and semi-supervised algorithms for the number of abnormal data while the unsupervised algorithms have poor performance on recall rate when solving the problems of local anomalies and wrapped a nomalies. This paper presents an incremental learning disk failure prediction model using the density metric of edge samples. An isolation region is built by searching the nearest neighbor of each sample. We calculate the nearest training point of the test point which is not a global anomaly and the nearest training point of the obtained nearest training point by Euclidean distance. The global metric of abnormal degree of the test sample comes from the ratio of the radius of the region where the two nearest training points are located. Then, the local metric of abnormal degree of the test sample comes from the ratio between the nearest distance from the test point to the edge of the training point region and the radius of the region. Abnormal scores of test points can be obtained by combining two measurements. We identify the SMART attributes that are significantly related to disk failures and promote their weights in the next time the attributes are inputted. The experiments are carried on the synthetic and public datasets which contain local anomalies and wrapped anomalies. The proposed method outperforms the typical unsupervised algorithms such as iNNE, iForest and LOF, and the achieved recall rates increase at most 7%. Furthermore, the contrast tests on the public disk datasets also verify the proposed method has better performance on recall rate.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords