International Journal of Technology (Apr 2022)
Authorship Obfuscation System Development based on Long Short-term Memory Algorithm
Abstract
Stylometry is an authorship analysis technique that uses statistics. Through stylometry, the authorship identity of a document can be analyzed with high accuracy. This poses a threat to the privacy of the author. Meanwhile, there is a stylometry method, namely the elimination of authorship identity, which can provide privacy protection for writers. This study uses the authorship method to eliminate the method applied to the Federalist Paper corpus. Federalist Paper is a well-known corpus that has been extensively studied, especially in authorship identification methods, considering that there are 12 disputed texts in the corpus. One identification method is the use of the support vector machine (SVM) algorithm. Through this algorithm, the author’s identity of disputed text can be obtained with 86% accuracy. The authorship identity elimination method can change the writing style while maintaining its meaning. Long-short-term memory (LSTM) is a deep learning-based algorithm that can predict words well. Through a model formed from the LSTM algorithm, the writing style of the disputed documents in the Federalist Paper can be changed. As a result, 4 out of 12 disputed documents can be changed from one author identity to another identity. The similarity level of the changed documents ranges from 40% to 57%, which indicates the meaning preservation from original documents. Our experimental results conclude that the proposed method can eliminate authorship identity well.
Keywords