Shipin Kexue (Jun 2024)

Expansion and Prediction of Small Sample Data of Contaminants in Grain Processing Using Combination of Generative Adversarial Networks and Deep Forest

  • GUO Xianglan, WANG li, JIN Xuebo, YU Jiabin, BAI Yuting, LI Hanyu, WEI Li’ang, MA Qian, WEN Haoran

DOI
https://doi.org/10.7506/spkx1002-6630-20240129-264
Journal volume & issue
Vol. 45, no. 12
pp. 22 – 30

Abstract

Read online

Accurate prediction of pollutants in grain processing is of great significance to ensure food safety. However, due to the complexity of grain processing and the difficulty of pollutant detection, the data volume is too small to meet the needs of modeling and forecasting, so it is necessary to develop a method for expanding pollutant data from small samples. At the same time, pollutant data of small samples in grain processing often lacks sufficient prior knowledge. Traditional supervised learning method has low prediction accuracy, and the existing continuous deep learning model is not suitable for grain processing, being intermittent. Hence, there is a need to develop a prediction method based on unsupervised learning and deep learning for pollutants in grain processing. This study proposed a prediction method for pollutants in grain processing based on data expansion with time generative adversarial networks (TimeGAN) or based on generative adversarial networks (GAN) combined with deep forest (DF). First, a TimeGAN model was constructed to learn from small sample data and generate multiple sets of sample data, achieving data augmentation. Then, combining the GAN model with unsupervised learning with the DF model suitable for a discrete process, a GAN-DF model was constructed for pollutant prediction. Next, the DF and long short-term memory (LSTM)-DF models were embedded into GAN as generators, separately, and the resulting DFGAN and LSTM-DFGAN models had improved accuracy in pollutant prediction. The results of simulation and verification using the data of the heavy metal pollutant lead (Pb) in rice processing showed that the TimeGAN method was feasible to expand data, and the LSTM-DFGAN model had the best comprehensive prediction performance. After data expansion, the average absolute error and root mean square error were as low as 7.50 × 10-5 and 1.60 × 10-8 mg/kg, respectively.

Keywords