IEEE Access (Jan 2021)

An Unsupervised Model for Identifying and Characterizing Dark Web Forums

  • Saiba Nazah,
  • Shamsul Huda,
  • Jemal H. Abawajy,
  • Mohammad Mehedi Hassan

DOI
https://doi.org/10.1109/ACCESS.2021.3103319
Journal volume & issue
Vol. 9
pp. 112871 – 112892

Abstract

Read online

Dark Web forums are significantly exploited to trade confidential information and illicit products by criminals. This paper addresses the problem of how to identify the cluster of discussion forums and their characteristics on the Dark Web. Exiting methods are mostly dependent on the continuous labeled contents, which are expensive and not feasible due to the nature of Dark Web data. Therefore, an approach that does not need a continuous availability of labeled forum and related knowledge is required. To this end, we propose an unsupervised model to identify and characterize Dark Web forums by combining clustering algorithm and decision tree algorithm. The proposed method presents the characteristics in an explainable form that can be used by the cyber threat intelligence system and law enforcement as scientific evidence to analyze any data breach or illicit activities in the Dark Web forums. To evaluate the performance of our model comprehensive experiments were conducted using real Dark Web forum data. The proposed approach achieves 98% accuracy and F1 score of 98% validating the efficacy of our proposed model to successfully characterize Dark Web forums. The experimental results suggest that the proposed model could be useful to the cyber threat intelligence and law enforcement community for building an intelligent source of knowledge that can be used for detecting data breach and illicit activities happening in the Dark Web forums.

Keywords