Applied Sciences (Nov 2022)

An Anomaly Detection Framework for Twitter Data

  • Sandeep Kumar,
  • Muhammad Badruddin Khan,
  • Mozaherul Hoque Abul Hasanat,
  • Abdul Khader Jilani Saudagar,
  • Abdullah AlTameem,
  • Mohammed AlKhathami

DOI
https://doi.org/10.3390/app122111059
Journal volume & issue
Vol. 12, no. 21
p. 11059

Abstract

Read online

An anomaly indicates something unusual, related to detecting a sudden behavior change, and is also helpful in detecting irregular and malicious behavior. Anomaly detection identifies unusual events, suspicious objects, or observations that differ significantly from normal behavior or patterns. Discrepancies in data can be observed in different ways, such as outliers, standard deviation, and noise. Anomaly detection helps us understand the emergence of specific diseases based on health-related tweets. This paper aims to analyze tweets to detect the unusual emergence of healthcare-related tweets, especially pre-COVID-19 and during COVID-19. After pre-processing, this work collected more than 44 thousand tweets and performed topic modeling. Non-negative matrix factorization (NMF) and latent Dirichlet allocation (LDA) were deployed for topic modeling, and a query set was designed based on resultant topics. This query set was used for anomaly detection using a sentence transformer. K-means was also employed for clustering outlier tweets from the cleaned tweets based on similarity. Finally, an unusual cluster was selected to identify pandemic-like healthcare emergencies. Experimental results show that the proposed framework can detect a sudden rise of unusual tweets unrelated to regular tweets. The new framework was employed in two case studies for anomaly detection and performed with 78.57% and 70.19% accuracy.

Keywords