PLoS ONE (Jan 2024)
A hybrid feature selection algorithm combining information gain and grouping particle swarm optimization for cancer diagnosis.
Abstract
BackgroundCancer diagnosis based on machine learning has become a popular application direction. Support vector machine (SVM), as a classical machine learning algorithm, has been widely used in cancer diagnosis because of its advantages in high-dimensional and small sample data. However, due to the high-dimensional feature space and high feature redundancy of gene expression data, SVM faces the problem of poor classification effect when dealing with such data.MethodsBased on this, this paper proposes a hybrid feature selection algorithm combining information gain and grouping particle swarm optimization (IG-GPSO). The algorithm firstly calculates the information gain values of the features and ranks them in descending order according to the value. Then, ranked features are grouped according to the information index, so that the features in the group are close, and the features outside the group are sparse. Finally, grouped features are searched using grouping PSO and evaluated according to in-group and out-group.ResultsExperimental results show that the average accuracy (ACC) of the SVM on the feature subset selected by the IG-GPSO is 98.50%, which is significantly better than the traditional feature selection algorithm. Compared with KNN, the classification effect of the feature subset selected by the IG-GPSO is still optimal. In addition, the results of multiple comparison tests show that the feature selection effect of the IG-GPSO is significantly better than that of traditional feature selection algorithms.ConclusionThe feature subset selected by IG-GPSO not only has the best classification effect, but also has the least feature scale (FS). More importantly, the IG-GPSO significantly improves the ACC of SVM in cancer diagnostic.