Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief

SUN Lin, HUANG Miao-miao, XU Jiu-cheng

doi:10.11896/jsjkx.210300094

Jisuanji kexue (Apr 2022)

Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief

SUN Lin, HUANG Miao-miao, XU Jiu-cheng

Affiliations

SUN Lin, HUANG Miao-miao, XU Jiu-cheng: 1 College of Computer and Information Engineering, Henan Normal University, Xinxiang, Henan 453007, China;<br/>2 Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Xinxiang, Henan 453007, China;<br/>3 School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China

DOI: https://doi.org/10.11896/jsjkx.210300094
Journal volume & issue: Vol. 49, no. 4
pp. 152 – 160

Abstract

Read online

In multi-label learning and classification, existing feature selection algorithms based on neighborhood rough sets will use classification margin of samples as the neighborhood radius.However, when the margin is too large, the classification may be meaningless.When the distances of samples are too large, it will easily result in the abnormal heterogeneous or similar samples, and these existing feature selection algorithms cannot deal with the weak label data.To address these issues, a weak label feature selection method based on multi-label neighborhood rough sets and multi-label Relief is proposed.First, the number of heterogeneous and similar samples is introduced to improve the classification margin, based on which, the neighborhood radius is defined, a new formula of neighborhood approximation accuracy is presented, and then the multi-label neighborhood rough sets model is constructed and can effectively measure the uncertainty of sets in the boundary region.Second, the iterative updated weight formula is employed to fill in most of the missing labels, and then by combining the neighborhood approximation accuracy with the mutual information, a new correlation between labels is developed to fill in the remaining information of missing labels.Third, the number of heterogeneous and similar samples continues to be used to improve the label weighting and feature weighting formulas, and then the multi-label Relief model is proposed for multi-label feature selection.Finally, based on the multi-label neighborhood rough sets model and the multi-label Relief algorithm, a weak label feature selection algorithm is designed to process high-dimensional data sets with missing labels and effectively improve the performance of multi-label classification.The simulation tests are carried out on eleven public multi-label data sets, and experimental results verify the effectiveness of the proposed weak label feature selection algorithm.

multi-label learning|feature selection|neighborhood rough sets|relief|missing labels

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords