IEEE Access (Jan 2022)

Applying Efficient Selection Techniques of Unlabeled Instances for Wrapper-Based Semi-Supervised Methods

  • Cephas A. S. Barreto,
  • Arthur Costa Gorgonio,
  • Joao C. Xavier-Junior,
  • Anne Magaly De Paula Canuto

DOI
https://doi.org/10.1109/ACCESS.2022.3169498
Journal volume & issue
Vol. 10
pp. 43535 – 43551

Abstract

Read online

Semi-supervised learning (SSL) is a machine learning approach that integrates supervised and unsupervised learning mechanisms. This integration may be done in different ways and one possibility is to use a wrapper-based strategy. The main aim of a wrapper-based strategy is to use a small number of labelled instances to create a learning model. Then, this created model is used in a labelling process, where some unlabelled instances are labelled, and consequently, these instances are incorporated into the labelled set. One important aspect of a wrapper-based SSL method is the selection of unlabelled instances to be labelled in the labelling process. In other words, an efficient selection process plays an important role in the design of a wrapper-based SSL method since it can lead to an efficient labelling process, and in turn, the creation of efficient learning models. In this paper, we propose the use of three selection methods that can be applied to wrapper-based SSL methods. The main idea is to use two different selection criteria, prediction confidence or classification agreement with a distance metric, to perform an efficient selection of the unlabelled instances. In order to assess the feasibility of the proposed approach, the selection methods are applied in two well-known wrapper-based SSL methods, which are: Self-training and Co-training. Additionally, an empirical analysis will be conducted in which we compare the standard Self-training and Co-training methods against the proposed versions of these two SSL methods over 35 classification datasets.

Keywords