IEEE Access (Jan 2021)

DLFT: Data and Layout Aware Fault Tolerance Framework for Big Data Transfer Systems

  • Preethika Kasu,
  • Prince Hamandawana,
  • Tae-Sun Chung

DOI
https://doi.org/10.1109/ACCESS.2021.3055731
Journal volume & issue
Vol. 9
pp. 22939 – 22954

Abstract

Read online

Various scientific research organizations generate several petabytes of data per year through computational science simulations. These data are often shared by geographically distributed data centers for data analysis. One of the major challenges in distributed environments is failure; hardware, network, and software might fail at any instant. Thus, high-speed and fault tolerant data transfer frameworks are vital for transferring such large data efficiently between the data centers. In this study, we proposed a bloom filter-based data aware probabilistic fault tolerance (DAFT) mechanism that can handle such failures. We also proposed a data and layout aware mechanism for fault tolerance (DLFT) to effectively handle the false positive matches of DAFT. We evaluated the data transfer and recovery time overheads of the proposed fault tolerance mechanisms on the overall data transfer performance. The experimental results demonstrated that the DAFT and DLFT mechanisms exhibit a maximum of 10% and a minimum of 2% recovery time overhead at 80% and 20% fault points respectively. However, we observed minimum to negligible overhead with respect to the overall data transfer rate.

Keywords