Comparison and benchmark of name-to-gender inference services

Lucía Santamaría; Helena Mihaljević

doi:10.7717/peerj-cs.156

PeerJ Computer Science (Jul 2018)

Comparison and benchmark of name-to-gender inference services

Lucía Santamaría,
Helena Mihaljević

Affiliations

Lucía Santamaría: Amazon Development Center, Berlin, Germany
Helena Mihaljević: University of Applied Sciences, Berlin, Germany

DOI: https://doi.org/10.7717/peerj-cs.156
Journal volume & issue: Vol. 4
p. e156

Abstract

Read online Read online

The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with information from social media profiles, culture-specific rules, and insights from sociolinguistics. We compare and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names. The compiled names are analyzed and characterized according to their geographical and cultural origin. We define a series of performance metrics to quantify various types of classification errors, and define a parameter tuning procedure to search for optimal values of the services’ free parameters. Finally, we perform benchmarks of all services under study regarding several scenarios where a particular metric is to be optimized.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords