Journal of Applied Computer Science & Mathematics (Nov 2021)
An Evaluation of Big Data Reduction Approaches
Abstract
When data is massive, data reduction is an essential move that helps to reduce the computational intractability of learning techniques. This is especially true for the massive datasets that have become popular in recent years. The key issue that both data preprocessors and learning techniques are facing is that data is growing in both dimensionality and the number of data instances. Big data analytics research entering new stage is known and the data is fast, in which several gigabytes of data reach in Bigdata systems per second. Due to the length, velocity, meaning, range, uncertainty, and veracity of the acquired data, modern big data systems capture inherently complex data sources, giving Big Data rises to 6Vs. The collection of reduced and correct data streams is more valuable than the aggregation raw, noisy data, unreliable, and repetitive. Another viewpoint on Bigdata reduction is that large datasets of millions of variables suffer from the dimensionality curse, which necessitates unlimited computing resources to discover practical information trends. This review provides an overview of strategies for reducing great amounts of data. In addition to taxonomic analysis for big data reduction, big data complexity, and big data collection.
Keywords