Труды Института системного программирования РАН (Oct 2018)

Detection of demographic attributes of microblog users

  • Anton Korshunov,
  • Ivan Beloborodov,
  • Andrey Gomzin,
  • Christina Chuprina,
  • Nikita Astrakhantsev,
  • Yaroslav Nedumod,
  • Denis Turdakov

Journal volume & issue
Vol. 25, no. 0
pp. 179 – 194

Abstract

Read online

Users of internet services often make errors or intentionally provide misleading information about their demographic attributes, including gender, age, marital status, education, religious and political views. At the same time, knowing values of user attributes allows to enhance the performance of recommender systems, internet marketing solutions, and other applications based on personalized results. In the paper, a method is proposed for automatic detection of demographic attributes of Twitter users by analyzing their textual messages and other data from their profiles. The method is based on a machine learning algorithm. Its distinctive features are fully automatic compilation of training and testing data sets as well as support for a broad and extendable range of languages and demographic attributes. Experimental study showed high accuracy of gender, age, and marital status detection for the most popular languages: English, Russian, German, French, Italian, and Spanish. Besides, detection of education, religious and political views is also supported for English.

Keywords