IEEE Access (Jan 2021)
On-Site Identification of Counterfeit Drugs Based on Near-Infrared Spectroscopy Siamese-Network Modeling
Abstract
Near-infrared spectroscopy (NIR) has become one of the most important methods for counterfeit drugs identification for its low cost, non-destructive, and on-site detection. However, it is often invalid for unknown samples beyond the scope of modeling samples, as well as it is not efficient in conditions of insufficient samples (insufficient number of samples within the class), unbalanced samples (large difference in the number of samples between classes), and sensitivity of identification results (different tolerance for different errors in application scenarios). To solve these problems, this paper proposes a general method for on-site identification of counterfeit drugs based on Siamese-network modeling with near-infrared spectroscopy, which especially constructs the train set and test set, learning the general knowledge of spectral differences to identify the different drugs by a costumed one-dimensional convolution neural network (1D-CNN), and finally answered the question of whether the on-site two spectra are pointed to the same drug. Based on experimental modeling samples of 1314 spectra, which are involved 9 drugs produced by 25 manufacturers, this paper has constructed and fully trained its model. Then, not known at the time of modeling, 4 drugs produced by 9 manufacturers are used for testing in the on-site application, and the accuracy rate amounts to 97.3%. For generalizing consideration, randomly divided into training and testing categories, the 32015 spectra of 135 drugs produced by 391 manufacturers in the spectral library are handled by the same processing. The generalization model is equally applicable, and the accuracy is above 97%. Compared with traditional binary classification identification methods such as SVM, PLS-DA, Auto-encoding, and one class (OC) threshold identification algorithms such as SVM-OC, SIMCA, conformity test (CT), the proposed method has the best identification ability for unknown samples in modeling.
Keywords