Big data processing using hybrid Gaussian mixture model with salp swarm algorithm

R. Saravanakumar; T. TamilSelvi; Digvijay Pandey; Binay Kumar Pandey; Darshan A. Mahajan; Mesfin Esayas Lelisho

doi:10.1186/s40537-024-01015-3

Journal of Big Data (Nov 2024)

Big data processing using hybrid Gaussian mixture model with salp swarm algorithm

R. Saravanakumar,
T. TamilSelvi,
Digvijay Pandey,
Binay Kumar Pandey,
Darshan A. Mahajan,
Mesfin Esayas Lelisho

Affiliations

R. Saravanakumar: Department of CSE, Dayananda Sagar Academy of Technology & Management
T. TamilSelvi: Department of CSE, Panimalar Institute of Technology
Digvijay Pandey: Department of Technical Education Uttar Pradesh
Binay Kumar Pandey: Department of Information Technology, College of Technology, Govind Ballabh Pant University of Agriculture and Technology Pantnagar
Darshan A. Mahajan: NICMAR University Pune
Mesfin Esayas Lelisho: Department of Statistics, Mizan-Tepi University

DOI: https://doi.org/10.1186/s40537-024-01015-3
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 29

Abstract

Read online

Abstract The traditional methods used in big data, like cluster creation and query-based data extraction, fail to yield accurate results on massive networks. To address such issues, the proposed approach involves using the Hadoop Distributed File System (HDFS) for data processing, the map-reduce programming paradigm for data processing, and query optimization techniques to quickly and effectively extract accurate outcomes from a variety of options with a high processing capacity. The methodology proposed in this work makes use of Gaussian Mixture Model (GMM) for data clustering and the Salp Swarm Algorithm (SSA) for optimization. The security of preprocessed data stored on networked clusters with interconnections has been ensured by SHA algorithms. Finally, incorporating into consideration the important parameters, evaluation findings for the experimental performance of the model in the indicated methodology are produced. For this work, the estimated range of input file sizes is 60–100 MB. The processing of 100 MB of input files yielded an accuracy of 96% and results for specificity and sensitivity of 90% and 93%, respectively. The outcomes have been compared with well-known methods like fuzzy C-means and K-means approaches, and the results show that the proposed method effectively distributes accurate data processing to cluster nodes with low latency. Moreover, it uses the least amount of memory resources possible when operating on functional CPUs. As a result, the proposed approach outperforms existing techniques.

Published in Journal of Big Data

ISSN: 2196-1115 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://journalofbigdata.springeropen.com

About the journal

Abstract

Keywords