Alexandria Engineering Journal (Nov 2024)
A data-driven multi-perspective approach to cybersecurity knowledge discovery through topic modelling
Abstract
Cybersecurity is crucial for protecting the privacy of digital systems, maintaining economic stability, and ensuring national security. This study presents a comprehensive approach to cybersecurity knowledge discovery through topic modelling, using a multi-perspective analysis of academic and industry sources. The datasets include 15,751 articles from the Web of Science (WoS) database and 5,831 articles from Security Magazine, spanning from 2011 to 2023. We employed BERTopic for topic modelling, UMAP for dimensionality reduction, and HDBSCAN clustering algorithm for grouping and analysing distinct article clusters to uncover latent topics, enhancing the understanding of the evolving cybersecurity landscape. This study found 12 micro-clusters and three macro-clusters, namely technology, smart city and education, from the WoS database and 12 more micro-clusters and four macro-clusters, including organization, public security, governance, and education, from magazines. This study reveals key cybersecurity research and practice trends, such as the increasing focus on malware, ransomware, and cyber-attack mitigation. Additionally, temporal analysis indicates a significant rise in cybersecurity interest around 2020, followed by a diversification of topics. The results highlight the importance of integrating diverse data sources to capture a holistic view of cybersecurity developments. Future work will aim to refine the clustering algorithms to further improve topic extraction and analysis and expand the datasets to include more diverse sources and perspectives. This approach helps discover current cybersecurity trends and provides a foundation for more targeted and effective cybersecurity strategies.