IEEE Access (Jan 2024)

HSM: A Hybrid Storage Method Based on the Heat of Data and Global Disk Space Utilization

  • Ying Song,
  • Wenxuan Zhao,
  • Yingai Tian,
  • Bo Wang

DOI
https://doi.org/10.1109/ACCESS.2024.3382987
Journal volume & issue
Vol. 12
pp. 48630 – 48639

Abstract

Read online

In distributed systems, the method for data storage is crucial. Previous data storage work use the replication or Erasure Coding method to store data. Such single storage method leads to the excessive storage overheads for cold data with low access frequency or the low reading performance for hot data with high access frequency. Nowadays, the research on the hybrid storage has become a hot topic of concern for many scholars. Existing hybrid storage works take into account data reading performance and the storage overheads, and use the replication and Erasure Coding methods to store the hot data and cold data respectively. However, in the scenarios of sufficient disk space or low disk space, these fixed data storage methods will lead to the relatively low system data reading performance or the excessively low disk space of the system. In this paper, we propose HSM, a hybrid storage method based on the heat of data and global disk space utilization. HSM fully considers the system’s requirements for the data reading performance and storage overheads under different global disk space utilization scenarios, and adaptively selects appropriate storage methods for data whose heat is different through data deletion, data reconstruction, and data archiving. The experiment results show that when system disk space is sufficient, HSM reduces data reading time by up to 18%; when system disk space is low, although increasing storage overhead by up to 7%, HSM reduces cross-rack data transfer traffic by up to 20% and cross-rack data transfer time by up to 15% compared with ERP in the process of changing the storage methods.

Keywords