IEEE Access (Jan 2024)

Scalable Data Partitioning Techniques for Distributed Data Processing in Cloud Environments: A Review

  • Sivakumar Ponnusamy,
  • Pankaj Gupta

DOI
https://doi.org/10.1109/ACCESS.2024.3365810
Journal volume & issue
Vol. 12
pp. 26735 – 26746

Abstract

Read online

Cloud storage allows individuals to store and access data from remote locations, providing the convenience of on-demand access to high-quality cloud applications. This eliminates the need for individuals to manage local hardware and software. The cloud storage system facilitates the efficient storage of data on cloud servers, allowing users to work with their data seamlessly without encountering resource constraints such as memory or storage limitations. Cloud computing is a technology that shows great promise owing to its ability to provide unlimited resources for computing and data storage services. These services are crucial for effectively managing the data according to specific requirements. In the current system, data is saved in the cloud using dynamic data operations and computations. This study explored the underlying principles of scalable data-partitioning techniques in the context of distributed data processing in cloud environments. The significance of this study lies in the increasing dependence of enterprises on cloud platforms for data-intensive tasks such as machine learning, data analytics, and real-time data processing. This study examines several data-partitioning strategies and methodologies developed to address the unique issues posed by cloud systems. The evaluation included an examination of their influence on the scalability, load distribution, and overall efficiency of the system. The main aim of this study is to enhance the domain of cloud-based data-processing techniques, thereby enabling enterprises to effectively leverage the full potential of the cloud for data-centric projects.

Keywords