Applied Network Science (Nov 2018)

The construction of Chinese microblog gender-specific thesauruses and user gender classification

  • Zhiliang Zhu,
  • Zejun Ke,
  • Jiayin Cui,
  • Hai Yu,
  • Guoqi Liu

DOI
https://doi.org/10.1007/s41109-018-0104-1
Journal volume & issue
Vol. 3, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Based on the statistical features, short text messages published by different gender users are different in terms of the words and semantics used. In this paper, two new features are constructed after constructing a gender-specific thesaurus. A new classification model is constructed by combining the traditional statistical features and the improved text implicitness feature. The experimental evaluation performed on the Sina Weibo dataset demonstrated the effectiveness of gender-specific thesaurus-based features, and the improved text implicitness feature improved the accuracy of gender classification to 84.7%.

Keywords