智慧农业 (Jan 2024)
Using a Portable Visible-near Infrared Spectrometer and Machine Learning to Distinguish and Quantify Mold Contamination in Wheat
Abstract
ObjectiveTraditional methods for detecting mold are time-consuming, labor-intensive, and vulnerable to environmental influences, highlighting the need for a swift, precise, and dependable detection approach. Researchers have utilized visible-near infrared (NIR) spectroscopy for the non-destructive, rapid assessment of wheat moisture content, crude protein content, concealed pests, starch content, dry matter, weight, hardness, origin, and other attributes. However, most of these studies rely on research-grade Visible-NIR spectrometers typically found in laboratories. While these spectrometers offer superior detection accuracy and stability, their bulky size, lack of portability, and high cost hinder their widespread use and adoption across various agricultural product distribution channels.MethodsA low-resolution Visible-NIR spectrometer (VNIAPD, with a resolution of 1.6 nm) was utilized to gather wheat data. The aim was to enhance the accuracy of moldy wheat detection by identifying suitable spectral data preprocessing methods using corresponding algorithms. A high-resolution Visible-NIR spectrometer (SINO2040, with a resolution of 0.19 nm) served as a control to validate the instrument and method's effectiveness. The Zhoumai (No. 22) wheat variety was adopted, with a total of 100 samples prepared. The spectra of fresh wheat were scanned and then placed in a constant temperature chamber at 35 °C to replicate the appropriate conditions for mold growth, thereby accelerating the reproduction of naturally occurring mold in the wheat. The degree of mold was categorized based on the cultivation time in the constant temperature chamber, with wheat classified as mildly, moderately, or severely moldy after 3, 6, and 9 days of cultivation, respectively. A total of 400 wheat spectral data points were collected, including 100 samples each of fresh wheat, wheat cultured for 3 days, wheat cultured for 6 days, and wheat cultured for 9 days. Preprocessing methods such as standard deviation normalization (SDN), standard normal variation (SNV), mean centrality (MC), first-order derivatives (1ST), Savitzky-Golay smoothing (SG), and multiple scattering correction (MSC) were applied to the spectral data. Outliers were identified and eliminated using the local outlier factor (LOF) method. Following this, the sequential projection algorithm (SPA) and Least absolute shrinkage and selection operator (LASSO) were used to extract characteristic wavelengths from the preprocessed spectra. Subsequently, six algorithms, including k-nearest neighbors (KNN), support vector machines (SVM), random forests (RF), Naïve-Bayes, back propagation neural networks (BPNN), and deep neural networks (DNN), were employed to model and analyze the feature wavelength spectra, differentiating moldy wheat and classifying the degree of mold. Evaluation criteria encompassed accuracy, modeling time, and model size to aid in selecting the most suitable model for specific application scenarios.Results and discussionsRegarding accuracy, even when utilizing the computationally slower and more memory-demanding neural network models BPNN and DNN, both the VNIAPD and SINO2040 achieved a perfect 100% accuracy in the binary classification task of distinguishing between fresh and moldy wheat. They also maintained a faultless 100% accuracy in the ternary classification task that differentiates three varying levels of mold growth. Adopting faster and more memory-efficient shallow models such as KNN, SVM, RF, and Naïve-Bayes, the VNIAPD yielded a top test set accuracy of 97.72% when combined with RF for binary classification. Conversely, SINO2040 achieved 100% accuracy using Naïve-Bayes. In the ternary classification scenario, the VNIAPD hit the mark at 100% accuracy with both KNN and RF, while SINO2040 demonstrated 97.72% accuracy with KNN and SVM. Regarding modeling speed, the shallow machine learning algorithms, including KNN, SVM, RF, and Naïve-Bayes, exhibited quicker training times, with Naïve-Bayes being the swiftest at just 3 ms. In contrast, the neural network algorithms BPNN and DNN required more time for training, taking 3 293 and 18 614 ms, respectively. Regarding memory footprint, BPNN had the largest model size, occupying 4 028 kb, whereas SVM was the most memory-efficient, with a size of only 4 kb. Overall, the VNIAPD matched the SINO2040 in detection accuracy despite having lower optical parameters: A slightly lesser optical resolution of 1.6 nm compared to the SINO2040's 0.19 nm—and a lower cost, highlighting its efficiency and cost-effectiveness in the given context.ConclusionsIn this study, by comparing different preprocessing methods for spectral data, the optimal data optimization choices for corresponding algorithms were identified. As a result, the low-resolution spectrometer VNIAPD was able to achieve performance on par with the high-resolution spectrometer SINO2040 in detecting moldy wheat, providing a new option for low-cost, non-destructive detection of wheat mold and the degree of moldiness based on Visible-NIR spectroscopy.
Keywords