Jisuanji kexue yu tansuo (Feb 2020)
Application of Multi-Layered Gradient Boosting Decision Trees in Pharmaceutical Classification
Abstract
Near-infrared spectroscopy technology is highly effective in pharmaceutical analysis. For high-dimensional and non-linear small-scale near-infrared data, traditional drug identification algorithms lack enough feature learning ability, neural network-based methods have problems of local optima and over-fitting, and they tend to ignore the sample imbalance. Aiming at the above disadvantages, a pharmaceutical classification approach with multi-layered gradient Boosting decision trees based on feature selection and cost-sensitive learning (CS_FGBDT) is proposed. Firstly, the raw data are preprocessed by Savitsky-Golay smoothing and first derivative. Secondly, the random forest is used to adaptively extract features from the preprocessed spectra, and the feature map is constructed by multi-layered gradient Boosting trees. Then the negative effect of sample imbalance is minimized by combining cost-sensitive learning. The experimental results show that the model comparatively evaluated on two imbalanced data-sets of capsule and tablet has higher prediction accuracy and stability and is an effective method for drug identification.
Keywords