Complexity (Jan 2024)

An NLP-Based Framework to Spot Extremist Networks in Social Media

  • Andrés Zapata Rozo,
  • Daniel Díaz-López,
  • Javier Pastor-Galindo,
  • Félix Gómez Mármol,
  • Umit Karabiyik

DOI
https://doi.org/10.1155/2024/3380488
Journal volume & issue
Vol. 2024

Abstract

Read online

Governments and law enforcement agencies (LEAs) are increasingly concerned about growing illicit activities in cyberspace, such as cybercrimes, cyberespionage, cyberterrorism, and cyberwarfare. In the particular context of cyberterrorism, hostile social manipulation (HSM) represents a strategy that employs different manipulation methods, mostly through social media, to promote extremism in social groups and encourage hostile behavior against a target. Thus, this paper proposes a framework based on natural language processing (NLP) that detects and inspects supposed HSM actions to support law enforcement agencies (LEAs) in the prevention of cyberterrorism. The proposal integrates different NLP techniques through three models: (i) a similarity model that relates content with similar semantic meaning, (ii) a polarity analysis model that estimates polarity, and (iii) a named-entity recognition (NER) model that recognizes relevant entities. In addition, our proposed framework is evaluated in each of its components through exhaustive experiments and is tested with a particular use case related to violent protests in Ecuador in October 2021. Use case’s results indicate that 3 and 4 clusters are obtained when Spanish and English-translated tweets are used, respectively. An analysis of polarity over English-translated tweets allows us to identify, through two different methods, the most negative cluster (#1). The results of the extraction of the mentions show that our framework is able to identify entities of the type of person that may be at risk with a precision of 89.91%. Knowledge graphs achieved in our use case allow us to identify how nodes that promote HSM are interconnected and work collaboratively. Finally, the computational costs of our proposal are quite favorable as memory consumption of similarity and polarity models is proportional to the number of processed tweets, confirming the feasibility of the solution in a real context.