IEEE Access (Jan 2022)
Adaptive Binary Bat and Markov Clustering Algorithms for Optimal Text Feature Selection in News Events Detection Model
Abstract
Wrapper Feature Selection (FS) methods based on the Binary Bat Algorithm (BBA) have recently been employed in a variety of detection applications to determine the most relevant feature subset. Despite the outstanding achievement of BBA in these domains, BBA has never been applied in Event Detection (ED). In our recent work, a novel wrapper FS approach based on BBA and Markov Clustering (MCL) method has been developed to bridge this gap and combat the curse of high dimensionality feature space for heterogeneous news text documents. However, ED from a massive number of heterogeneous news text documents with varying text lengths is a challenging task. The exploration performance of the BBA declines as the scale of the feature space grows due to the fast convergence rate problem that causes the BBA to fall into local optimum solutions. BBA’s loudness ( $A$ ) and emission rate ( $r$ ) are significantly responsible for controlling the convergence behaviour. As a result, this study proposes two adaptive techniques for the $A$ and $r$ parameters to adjust BBA’s convergence behavior as the dataset size changes. A new variant called Adaptive BBA (ABBA) with MCL (ABBAMCL) is proposed to improve the performance of the ED model. The ABBAMCL method has been tested over 10 benchmark datasets and two primary Facebook news datasets using several evaluation measures. The empirical results demonstrate the ability of ABBAMCL to identify a small number of informative features to detect real-world events from heterogeneous news text documents. Furthermore, with a $p$ -value of 0.00, the statistical results show that the ABBAMCL FS method based on the proposed controlling techniques outperforms most of other FS methods in producing high-quality event clusters.
Keywords