IEEE Access (Jan 2023)

Deep Learning-Based Gender Classification by Training With Fake Data

  • Mohamed Oulad-Kaddour,
  • Hamid Haddadou,
  • Cristina Conde Vilda,
  • Daniel Palacios-Alonso,
  • Karima Benatchba,
  • Enrique Cabello

DOI
https://doi.org/10.1109/ACCESS.2023.3328210
Journal volume & issue
Vol. 11
pp. 120766 – 120779

Abstract

Read online

Gender classification of human faces is a trending topic and a remarkable biometric task. This research area has useful applications in several fields, such as automated border control (ABC) and forensic work. There are many approaches to gender classification in the literature; the classical approaches usually use real faces. Although good performances have been achieved, data collection remains a problem. Additionally, the privacy of individuals must be included in many existing works. These drawbacks can be overcome by using fake faces. Recently, the creation of a robust fake face corpus using machine learning has become possible. Our main contribution in the present paper is to experimentally investigate the ability of an artificial deepfake corpus to be a substitute for real corpora in facial gender classification tasks. We propose a deep learning-based approach using convolutional neural networks trained with fake faces and tested on real faces. By exploiting artificial faces, data collection obstacles are resolved for the training step, and privacy is highly preserved. Four classifiers based on popular convolutional neural network architectures were implemented. In the test phase, we used faces of real identities extracted from well-known experimental databases such as Face Recognition Technology (FERET), Faculdade de Engenharia Industrial (FEI) faces, Face Recognition and Artificial Vision (FRAV) and Labeled Faces in the Wild (LFW). The results achieved are very promising. We obtained high accuracy rates and low EER scores. They are similar to those of research works using real faces. As a result of this work, we propose a gender-labeled deepfake facial dataset containing more than 200k deepfake corpora that we will make available upon request for research purposes.

Keywords