IEEE Access (Jan 2024)
Self-Training Algorithm With Block Similar Neighbor Editing
Abstract
In the real world, there are only a small amount of data with labels. To make full use of the potential structural information of unlabeled data to train a better classifier, researchers have proposed many semi-supervised learning algorithms. Among these algorithms, self-training is one of the most widely used semi-supervised learning frameworks due to its simplicity. How to select high-confidence samples is a crucial step for self-training. If the misclassified samples are selected as high-confidence samples, this error will be amplified in the iterative process, which affects the performance of the final classifier. To alleviate the impact of this problem, this paper proposes a self-training algorithm with block-similar neighbor editing (STBSNE). STBSNE calculates the distance between samples by the block-based dissimilarity measure, which improves the classification performance on high-dimensional data sets. STBSNE defines the block-estimated neighbor relationship, builds the block-estimated neighbor relationship graph, and proposes the block estimated neighbor editing method to identify outliers and noise points, and edits them to improve the quality of the high-confidence sample selected. Experimental results on 16 benchmark data sets verify the superior performance of the proposed STBSNE compared with seven state-of-the-art algorithms.
Keywords