Blockchain: Research and Applications (Dec 2021)

BlockHDFS: Blockchain-integrated Hadoop distributed file system for secure provenance traceability

  • Viraaji Mothukuri,
  • Sai S. Cheerla,
  • Reza M. Parizi,
  • Qi Zhang,
  • Kim-Kwang Raymond Choo

Journal volume & issue
Vol. 2, no. 4
p. 100032

Abstract

Read online

Hadoop Distributed File System (HDFS) is one of the widely used distributed file systems in big data analysis for frameworks such as Hadoop. HDFS allows one to manage large volumes of data using low-cost commodity hardware. However, vulnerabilities in HDFS can be exploited for nefarious activities. This reinforces the importance of ensuring robust security to facilitate file sharing in Hadoop as well as having a trusted mechanism to check the authenticity of shared files. This is the focus of this paper, where we aim to improve the security of HDFS using a blockchain-enabled approach (hereafter referred to as BlockHDFS). Specifically, the proposed BlockHDFS uses the enterprise-level Hyperledger Fabric platform to capitalize on files' metadata for building trusted data security and traceability in HDFS.

Keywords