A dataset of histograms of original and fake voice recordings (H-Voice)

Dora M. Ballesteros; Yohanna Rodriguez; Diego Renza

Data in Brief (Apr 2020)

A dataset of histograms of original and fake voice recordings (H-Voice)

Dora M. Ballesteros,
Yohanna Rodriguez,
Diego Renza

Affiliations

Dora M. Ballesteros: Corresponding author.; Universidad Militar Nueva Granada, Colombia
Yohanna Rodriguez: Universidad Militar Nueva Granada, Colombia
Diego Renza: Universidad Militar Nueva Granada, Colombia

Journal volume & issue: Vol. 29

Abstract

Read online

This paper presents H-Voice, a dataset of 6672 histograms of original and fake voice recordings obtained by the Imitation [1,2] and the Deep Voice [3] methods. The dataset is organized into six directories: Training_fake, Training_original, Validation_fake, Validation_original, External_test1, and External_test2. The training directories include 2088 histograms of fake voice recordings and 2020 histograms of original voice recordings. Each validation directory has 864 histograms obtained from fake voice recordings and original voice recordings. Finally, External_test1 has 760 histograms (380 from fake voice recordings obtained by the Imitation method and 380 from original voice recordings), and External_test2 has 76 histograms (72 from fake voice recordings obtained by the Deep Voice method and 4 from original voice recordings). With this dataset, the researchers can train, cross-validate and test classification models using machine learning techniques to identify fake voice recordings. Keywords: Fake voice, Machine learning, Convolutional neural networks, Binary classification, Imitation, Deep voice, H-Voice

Published in Data in Brief

ISSN: 2352-3409 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Science (General)
Website: http://www.journals.elsevier.com/data-in-brief/

About the journal