Journal of Spectroscopy (Jan 2022)

A Variable Selection Method Based on Fast Nondominated Sorting Genetic Algorithm for Qualitative Discrimination of Near Infrared Spectroscopy

  • Hubin Liu,
  • Na Liu,
  • Yuhui Yuan,
  • Cihai Zhang,
  • Longlian Zhao,
  • Junhui Li

DOI
https://doi.org/10.1155/2022/2141872
Journal volume & issue
Vol. 2022

Abstract

Read online

A reliable and effective qualitative near-infrared (NIR) spectroscopy discrimination method is critical for excellent model building, yet the performance of models built by these methods is highly dependent on valid feature extraction. The goal of feature selection is to associate the selected variables with the property of interest, which many have done successfully. However, many of selection methods focus only on strong association with the analytes or properties of interest, neglecting correlations between variables. A variable selection method based on a fast nondominated-ranking genetic algorithm (NSGA-II) was proposed in this paper for qualitative discrimination of NIR spectra. The method had two objective functions: (1) maximizing the sum of ratios of interclass variance to intraclass variance, (2) minimizing the sum of correlation coefficients between the selected variables. FT-NIR spectra of a total of 124 tobacco samples from different origins and parts in Guizhou Province, China, were used as the experimental objects, and the part-grade discrimination models of tobacco leaves were established by combining this method with partial least squares-based discriminant analysis (PLS-DA), and compared with PLS-DA model based on the full spectrum. The results showed that the performance of PLS-DA model with the NSGA-II was improved, with a comparable or better correct discrimination rate and reasonable discrimination rate, and could discriminate different parts of the tobacco leaves well. It indicates that the NSGA-II can select a few and effective feature variables to build a high-performance qualitative discrimination model and is proved to be a promising algorithm. In addition, the method is not designed exclusively for spectral data. It is a general strategy that could be used for variable selection for other types of data.