An NLP-Based Framework to Spot Extremist Networks in Social Media

Andrés Zapata Rozo; Daniel Díaz-López; Javier Pastor-Galindo; Félix Gómez Mármol; Umit Karabiyik

doi:10.1155/2024/3380488

Complexity (Jan 2024)

An NLP-Based Framework to Spot Extremist Networks in Social Media

Andrés Zapata Rozo,
Daniel Díaz-López,
Javier Pastor-Galindo,
Félix Gómez Mármol,
Umit Karabiyik

Affiliations

Andrés Zapata Rozo: School of Engineering
Daniel Díaz-López: School of Engineering
Javier Pastor-Galindo: Faculty of Computer Science
Félix Gómez Mármol: Faculty of Computer Science
Umit Karabiyik: Department of Computer and Information Technology

DOI: https://doi.org/10.1155/2024/3380488
Journal volume & issue: Vol. 2024

Abstract

Read online

Governments and law enforcement agencies (LEAs) are increasingly concerned about growing illicit activities in cyberspace, such as cybercrimes, cyberespionage, cyberterrorism, and cyberwarfare. In the particular context of cyberterrorism, hostile social manipulation (HSM) represents a strategy that employs different manipulation methods, mostly through social media, to promote extremism in social groups and encourage hostile behavior against a target. Thus, this paper proposes a framework based on natural language processing (NLP) that detects and inspects supposed HSM actions to support law enforcement agencies (LEAs) in the prevention of cyberterrorism. The proposal integrates different NLP techniques through three models: (i) a similarity model that relates content with similar semantic meaning, (ii) a polarity analysis model that estimates polarity, and (iii) a named-entity recognition (NER) model that recognizes relevant entities. In addition, our proposed framework is evaluated in each of its components through exhaustive experiments and is tested with a particular use case related to violent protests in Ecuador in October 2021. Use case’s results indicate that 3 and 4 clusters are obtained when Spanish and English-translated tweets are used, respectively. An analysis of polarity over English-translated tweets allows us to identify, through two different methods, the most negative cluster (#1). The results of the extraction of the mentions show that our framework is able to identify entities of the type of person that may be at risk with a precision of 89.91%. Knowledge graphs achieved in our use case allow us to identify how nodes that promote HSM are interconnected and work collaboratively. Finally, the computational costs of our proposal are quite favorable as memory consumption of similarity and polarity models is proportional to the number of processed tweets, confirming the feasibility of the solution in a real context.

Published in Complexity

ISSN: 1076-2787 (Print); 1099-0526 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://onlinelibrary.wiley.com/journal/8503

About the journal