Self-Training Algorithm With Block Similar Neighbor Editing

Wenwang Bai; Cuihong Zhang; Zhengguo Yang; He Yang

doi:10.1109/ACCESS.2024.3440915

IEEE Access (Jan 2024)

Self-Training Algorithm With Block Similar Neighbor Editing

Wenwang Bai,
Cuihong Zhang,
Zhengguo Yang,
He Yang

Affiliations

Wenwang Bai: College of Mathematics and Statistics, Northwest Normal University, Lanzhou, Gansu, China
Cuihong Zhang: School of Information Engineering and Artificial Intelligence, Lanzhou University of Finance and Economics, Lanzhou, Gansu, China
Zhengguo Yang: ORCiD; School of Information Engineering and Artificial Intelligence, Lanzhou University of Finance and Economics, Lanzhou, Gansu, China
He Yang: College of Mathematics and Statistics, Northwest Normal University, Lanzhou, Gansu, China

DOI: https://doi.org/10.1109/ACCESS.2024.3440915
Journal volume & issue: Vol. 12
pp. 110418 – 110431

Abstract

Read online

In the real world, there are only a small amount of data with labels. To make full use of the potential structural information of unlabeled data to train a better classifier, researchers have proposed many semi-supervised learning algorithms. Among these algorithms, self-training is one of the most widely used semi-supervised learning frameworks due to its simplicity. How to select high-confidence samples is a crucial step for self-training. If the misclassified samples are selected as high-confidence samples, this error will be amplified in the iterative process, which affects the performance of the final classifier. To alleviate the impact of this problem, this paper proposes a self-training algorithm with block-similar neighbor editing (STBSNE). STBSNE calculates the distance between samples by the block-based dissimilarity measure, which improves the classification performance on high-dimensional data sets. STBSNE defines the block-estimated neighbor relationship, builds the block-estimated neighbor relationship graph, and proposes the block estimated neighbor editing method to identify outliers and noise points, and edits them to improve the quality of the high-confidence sample selected. Experimental results on 16 benchmark data sets verify the superior performance of the proposed STBSNE compared with seven state-of-the-art algorithms.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords