Proceedings of the XXth Conference of Open Innovations Association FRUCT (Apr 2020)

A Preliminary Performance Comparison of Machine Learning Algorithms for Web Author Identification of Vietnamese Online Messages

  • Alisa Vorobeva,
  • Bui N. Khanh

DOI
https://doi.org/10.23919/FRUCT48808.2020.9087531
Journal volume & issue
Vol. 26, no. 1
pp. 166 – 173

Abstract

Read online

With the rapid development of the Internet and accompanying technologies, communication between people has become easier than ever. Email, news sites, social networking applications become an indispensable connection tool. However, the Internet is also a favorable environment for cybercriminals with malicious activities. Therefore, it is necessary to develop a method to determine which user is the author of the online message. There has been a lot of researches with different corpora and various languages. In this article, we propose an approach to identify the authors of online messages in Vietnamese based on machine learning algorithms. Algorithms used include Naive Bayes, SVM, Random Forest, and Logistic Regression. The algorithm that has yielded the best results is the Random Forest.

Keywords