International Journal of Data and Network Science (Jan 2024)
Data tweet clustering using bidirectional gated recurrent unit and k-prototype for the Indonesian political year
Abstract
As time passes, social media, which was formerly used as a means of communication between users, is experiencing a transition as a means for broadcasting information, conducting business, advertising, and even political campaigning. In elections, social media is also used to discredit political opponents to reduce the electability of opposing candidate. Spreading hate speech and fake news to undermine the electability of opposing candidate is a common violation of the law committed by supporters of one candidate over another. Considering that the number of social media users increases annually at a very rapid rate, the hazard of social media abuse has the potential to grow. In 2022, Indonesia had 191 million social media users in January 2022. Obviously, this will make the election situation more tumultuous and has the potential to cause societal divisions. The government must have a control system in place to screen social media content that can be considered illegal. In this study, fake news and hate speech are classified using the Bidirectional Gated Recurrent Unit (BiGRU). Lastly, K-Prototype was used to do clustering based on categorization dimensions and probable distribution to identify which clusters had the greatest risk of breaking the law, creating confusion, and dispersing broadly throughout society. It is hoped that the clusters that are created will represent the levels of priority of tweet data that requires prompt attention from the government to prevent it from spreading and inciting social unrest. Based on the results of the analysis, the BiGRU fake news model yields a F1-score of 95%, while the BiGRU hate speech model yields a F1-score of 90%. Clustering data using K-Prototype in this research can reduce the number of tweet data from 13,183 to 1,791 data. These new data are considered as a priority that must be pursued in preventing social media disputes.