IEEE Access (Jan 2022)

Predicting Classification Accuracy of Unlabeled Datasets Using Multiple Deep Neural Networks

  • Shingchern D. You,
  • Hsiao-Chung Liu,
  • Chien-Hung Liu

DOI
https://doi.org/10.1109/ACCESS.2022.3169279
Journal volume & issue
Vol. 10
pp. 44627 – 44637

Abstract

Read online

In machine learning problems, we usually assume that the validation accuracy is a good estimation of prediction accuracy for datasets without ground truth. In reality, this assumption may not hold. Therefore, we propose an approach to estimate the prediction accuracy of a target model on unlabeled datasets. The proposed approach uses multiple target homogeneous models to assign each unlabeled sample a confidence value, based on the number of models agreeing on the predicted label. With the confidence values, the prediction accuracy of the target model on the datasets can be estimated. In the experiments, the target model is a convolutional neural network (CNN) model, and the homogeneous models only differ in initial weights. The experiments are conducted with datasets from a wide variety of music genres. The estimation performance of the proposed approach is compared with the reversed testing qualities (RTQ) and the ensemble average qualities (EAQ) approaches. The RTQ approach was proposed to estimate the prediction accuracy of trained models, and the EAQ approach was originally designed for estimating the predictive uncertainty of individual samples. We apply all three compared models to estimate prediction accuracy of datasets by using a linear model. The parameters of the linear model are either computed by using multiple labeled datasets or one labeled dataset. The experimental results show that when compared with the RTQ approach, the proposed approach has much lower estimation errors for some datasets. When compared with the EAQ approach, the proposed approach is more robust for datasets with large distribution shifts. Finally, we show an additional benefit of the proposed approach. In case that the estimated accuracy is unsatisfactory, we may re-train the target model with a new training set, which contains the original training samples plus new training samples with manual labeling from the unlabeled dataset. The experimental results confirm that it is more effective to select (and label) new samples from those with low confidence values than those randomly selected. Overall, the proposed approach is a promising approach for estimating prediction accuracy on unlabeled datasets.

Keywords