Leveraging Machine Learning and Semi-Structured Information to Identify Political Views from Social Media Posts

Adriana Olteanu; Alexandra Cernian; Sebastian-Augustin Gâgă

doi:10.3390/app122412962

Applied Sciences (Dec 2022)

Leveraging Machine Learning and Semi-Structured Information to Identify Political Views from Social Media Posts

Adriana Olteanu,
Alexandra Cernian,
Sebastian-Augustin Gâgă

Affiliations

Adriana Olteanu: Faculty of Automatic Control and Computers, Politehnica University of Bucharest, 060042 Bucharest, Romania
Alexandra Cernian: Faculty of Automatic Control and Computers, Politehnica University of Bucharest, 060042 Bucharest, Romania
Sebastian-Augustin Gâgă: Faculty of Automatic Control and Computers, Politehnica University of Bucharest, 060042 Bucharest, Romania

DOI: https://doi.org/10.3390/app122412962
Journal volume & issue: Vol. 12, no. 24
p. 12962

Abstract

Read online

Social media platforms make a significant contribution to modeling and influencing people’s opinions and decisions, including political views and orientation. Analyzing social media content can reveal trends and key triggers that will influence society. This paper presents an exhaustive analysis of the performance generated by various implementations of the Naïve Bayes classifier, combined with a semi-structured information approach, to identify the political orientation of Twitter users, based on their posts. As research methodology, we aggregate in a semi-structured format a database of over 86,000 political posts from Democrat (right) and Republican (left) ideologies. Such an approach allows us to associate a Democrat or Republican label to each tweet, in order to create and train the model. The semi-structured input data are processed using several NLP techniques and then the model is trained to classify the political orientation based on semantic criteria and semi-structured information. This paper examines several variations of the Naïve Bayes classifier suite: Gaussian Naïve Bayes, Multinomial Naïve Bayes, Calibrated Naïve Bayes algorithms, and tracks a variety of performance indices and their graphical representations: Prediction Accuracy, Precision, Recall, Confusion Matrix, Brier Score Loss, etc. We obtained an accuracy of around 80–85% in identifying the political orientation of the users. This leads us to the conclusion that this type of application can be integrated into a more complex system and can help in determining political trends or election results.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords