Applied Mathematics and Nonlinear Sciences (Jan 2024)
Research on machine learning based processing strategies for large-scale datasets
Abstract
In this paper, we first mine the interconnections between data in large-scale datasets through association rule models in machine learning and then perform T -time K-Means clustering on the mined datasets to realize large-scale data integration. On this basis, a classification prediction model based on an enhanced ChebNet model is proposed, which combines the efficient feature extraction capability of graph convolutional neural network and the accurate prediction advantage of big data analysis to effectively realize the processing of large-scale data sets. Taking the tobacco production monitoring data as an example, the model performs well in predicting the correlation of cigarette sensory indexes, especially when the sliding window size is 30 and the prediction jump step is 1. The model performance reaches the optimal, which provides strong support for the quality control of cigarette production, and is capable of processing large-scale datasets of tobacco production.
Keywords