Introducing Urdu Digits Dataset with Demonstration of an Efficient and Robust Noisy Decoder-Based Pseudo Example Generator

Wisal Khan; Kislay Raj; Teerath Kumar; Arunabha M. Roy; Bin Luo

doi:10.3390/sym14101976

Symmetry (Sep 2022)

Introducing Urdu Digits Dataset with Demonstration of an Efficient and Robust Noisy Decoder-Based Pseudo Example Generator

Wisal Khan,
Kislay Raj,
Teerath Kumar,
Arunabha M. Roy,
Bin Luo

Affiliations

Wisal Khan: School of Computer and Technology, Anhui University, Hefei 230039, China
Kislay Raj: School of Computing, Dublin City University, SFI for Research Training in Artificial Intelligence, Dublin 9, Ireland
Teerath Kumar: Department of Software Engineering, School of Computing, National University of Computer and Emerging Sciences, Islamabad 44000, Pakistan
Arunabha M. Roy: Aerospace Engineering Department, University of Michigan, Ann Arbor, MI 48109, USA
Bin Luo: School of Computer and Technology, Anhui University, Hefei 230039, China

DOI: https://doi.org/10.3390/sym14101976
Journal volume & issue: Vol. 14, no. 10
p. 1976

Abstract

Read online

In the present work, we propose a novel method utilizing only a decoder for generation of pseudo-examples, which has shown great success in image classification tasks. The proposed method is particularly constructive when the data are in a limited quantity used for semi-supervised learning (SSL) or few-shot learning (FSL). While most of the previous works have used an autoencoder to improve the classification performance for SSL, using a single autoencoder may generate confusing pseudo-examples that could degrade the classifier’s performance. On the other hand, various models that utilize encoder–decoder architecture for sample generation can significantly increase computational overhead. To address the issues mentioned above, we propose an efficient means of generating pseudo-examples by using only the generator (decoder) network separately for each class that has shown to be effective for both SSL and FSL. In our approach, the decoder is trained for each class sample using random noise, and multiple samples are generated using the trained decoder. Our generator-based approach outperforms previous state-of-the-art SSL and FSL approaches. In addition, we released the Urdu digits dataset consisting of 10,000 images, including 8000 training and 2000 test images collected through three different methods for purposes of diversity. Furthermore, we explored the effectiveness of our proposed method on the Urdu digits dataset by using both SSL and FSL, which demonstrated improvement of 3.04% and 1.50% in terms of average accuracy, respectively, illustrating the superiority of the proposed method compared to the current state-of-the-art models.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords