Applied Sciences (Nov 2022)
An Anomaly Detection Framework for Twitter Data
Abstract
An anomaly indicates something unusual, related to detecting a sudden behavior change, and is also helpful in detecting irregular and malicious behavior. Anomaly detection identifies unusual events, suspicious objects, or observations that differ significantly from normal behavior or patterns. Discrepancies in data can be observed in different ways, such as outliers, standard deviation, and noise. Anomaly detection helps us understand the emergence of specific diseases based on health-related tweets. This paper aims to analyze tweets to detect the unusual emergence of healthcare-related tweets, especially pre-COVID-19 and during COVID-19. After pre-processing, this work collected more than 44 thousand tweets and performed topic modeling. Non-negative matrix factorization (NMF) and latent Dirichlet allocation (LDA) were deployed for topic modeling, and a query set was designed based on resultant topics. This query set was used for anomaly detection using a sentence transformer. K-means was also employed for clustering outlier tweets from the cleaned tweets based on similarity. Finally, an unusual cluster was selected to identify pandemic-like healthcare emergencies. Experimental results show that the proposed framework can detect a sudden rise of unusual tweets unrelated to regular tweets. The new framework was employed in two case studies for anomaly detection and performed with 78.57% and 70.19% accuracy.
Keywords