IEEE Access (Jan 2019)
Hybrid Water Cycle Optimization Algorithm With Simulated Annealing for Spam E-mail Detection
Abstract
Spam is defined as junk and unwanted e-mail. The implementation of a reliable spam email filter becomes more and more important for e-mail users since they have to face with the growing amount of uninvited e-mails. The faults of spam classifiers are characterized by being more and more insufficient to handle huge volumes of relevant emails and to identify and detect the new spam email as example with high performance. The problem in spam classifiers is a huge number of features. Feature selection is an important task in keyword content classification for being among the most popular and effective methods for feature reduction. Accordingly, irrelevant and redundant features that can impede performance would be eliminated. Meta-heuristic optimization is to choose the optimal solution between possible multi-solutions, which respect the aim of this research that is the performance. The other problem is related to ambiguity of the effect of optimization feature selection on multiple classifiers algorithm which are popular used by previous work namely; K-nearest Neighbor, Naïve Bayesian and Support Vector Machine. Therefore, the aim of this research is to improve the accuracy of feature selection by applying hybrid Water Cycle and Simulated Annealing to optimize results and to evaluate the proposed Spam Detection. The methodology used in this study which consists of groundwork, induction, improvement, evaluation and comparison quality. The cross-validation was used for training and validation dataset and seven datasets were employed in testing the spam classification proposed. The results demonstrate that the meta-heuristic namely water cycle feature selection (WCFS) was employed and three ways of hybridization with Simulated Annealing as a feature selection employed. In comparison with other feature selection algorithms such as Harmony Search, Genetic Algorithm, and Particle Swarm, the hybridization interleaved hybridization outperformed other feature selection algorithms with accuracy 96.3%, on the other side the effect of using three classifier algorithms, the SVM was better than other of classifier algorithms with f-measurement 96.3%. The number of features using interleaved water cycle and Simulated Annealing the number of features has decreased to more than 50%.
Keywords