IET Networks (Nov 2021)
Enhancing topic clustering for Arabic security news based on k‐means and topic modelling
Abstract
Abstract The internet has become one of the main sources of news spread as it unleashed the information dissemination space, where the news websites express opinions on entities while also reporting on recent or unusual security risks. Recently, many research studies have focused on sentimental reflection on the views and impressions of people utilising natural language processing and analytical linguistics. Therefore, we have collected corpus from popular Arabic websites that publish articles related to recent security issues, and we provide light weight preprocessing techniques where data is term matrix is transformed. We also present an intensive lexical‐driven data analysis with visualised data views, as our topic modelling technique can effectively extract significant topics from all the collected text from different websites. Our experiments validate the k‐means clustering algorithm with and without the latent Dirichlet allocation topic modelling method, and we adopted various validation techniques to measure the topic clustering internally and externally. As shown in the experiments' results, our proposed combined method has a high round index rate of 87.2%, with a large number of topics and clusters.
Keywords