IEEE Access (Jan 2024)
Semantic Clustering and Transfer Learning in Social Media Texts Authorship Attribution
Abstract
This paper is the fourth part of a research series that focuses on determining the authorship of Russian-language texts by analyzing short social media comments, including those from mass media and communities associated with destructive content. Semantic text clustering was used to analyze content and employed a transfer learning technique based on a pre-trained model to identify sensitive topics. Authorship attribution is implemented as a classical classification task with a closed set of authors and a more challenging open-set task. In the latter case, multiple experiments were conducted, incorporating the identification of destructive content with known authors and artificially generated texts. For open attribution, a method combining One-Class SVM and fastText was proposed. Results demonstrate high accuracy (92% or higher) for cases with 2 and 5 authors, regardless of comment length and the additional task of identifying authors of destructive text. Mixed-data experiments involving 10 or more authors yielded results comparable to or more accurate (84% or higher) than previous studies.
Keywords