IEEE Access (Jan 2024)

Semantic Clustering and Transfer Learning in Social Media Texts Authorship Attribution

  • Anastasia Fedotova,
  • Anna Kurtukova,
  • Aleksandr Romanov,
  • Alexander Shelupanov

DOI
https://doi.org/10.1109/ACCESS.2024.3377231
Journal volume & issue
Vol. 12
pp. 39783 – 39803

Abstract

Read online

This paper is the fourth part of a research series that focuses on determining the authorship of Russian-language texts by analyzing short social media comments, including those from mass media and communities associated with destructive content. Semantic text clustering was used to analyze content and employed a transfer learning technique based on a pre-trained model to identify sensitive topics. Authorship attribution is implemented as a classical classification task with a closed set of authors and a more challenging open-set task. In the latter case, multiple experiments were conducted, incorporating the identification of destructive content with known authors and artificially generated texts. For open attribution, a method combining One-Class SVM and fastText was proposed. Results demonstrate high accuracy (92% or higher) for cases with 2 and 5 authors, regardless of comment length and the additional task of identifying authors of destructive text. Mixed-data experiments involving 10 or more authors yielded results comparable to or more accurate (84% or higher) than previous studies.

Keywords