Jisuanji kexue (Apr 2022)

Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief

  • SUN Lin, HUANG Miao-miao, XU Jiu-cheng

DOI
https://doi.org/10.11896/jsjkx.210300094
Journal volume & issue
Vol. 49, no. 4
pp. 152 – 160

Abstract

Read online

In multi-label learning and classification, existing feature selection algorithms based on neighborhood rough sets will use classification margin of samples as the neighborhood radius.However, when the margin is too large, the classification may be meaningless.When the distances of samples are too large, it will easily result in the abnormal heterogeneous or similar samples, and these existing feature selection algorithms cannot deal with the weak label data.To address these issues, a weak label feature selection method based on multi-label neighborhood rough sets and multi-label Relief is proposed.First, the number of heterogeneous and similar samples is introduced to improve the classification margin, based on which, the neighborhood radius is defined, a new formula of neighborhood approximation accuracy is presented, and then the multi-label neighborhood rough sets model is constructed and can effectively measure the uncertainty of sets in the boundary region.Second, the iterative updated weight formula is employed to fill in most of the missing labels, and then by combining the neighborhood approximation accuracy with the mutual information, a new correlation between labels is developed to fill in the remaining information of missing labels.Third, the number of heterogeneous and similar samples continues to be used to improve the label weighting and feature weighting formulas, and then the multi-label Relief model is proposed for multi-label feature selection.Finally, based on the multi-label neighborhood rough sets model and the multi-label Relief algorithm, a weak label feature selection algorithm is designed to process high-dimensional data sets with missing labels and effectively improve the performance of multi-label classification.The simulation tests are carried out on eleven public multi-label data sets, and experimental results verify the effectiveness of the proposed weak label feature selection algorithm.

Keywords