A Spam Filtering Method Based on Multi-Modal Fusion

Hong Yang; Qihe Liu; Shijie Zhou; Yang Luo

doi:10.3390/app9061152

Applied Sciences (Mar 2019)

A Spam Filtering Method Based on Multi-Modal Fusion

Hong Yang,
Qihe Liu,
Shijie Zhou,
Yang Luo

Affiliations

Hong Yang: The School of Information and Software Enginerring, University of Electronic Science and Technology of China, Chengdu 610054, China
Qihe Liu: The School of Information and Software Enginerring, University of Electronic Science and Technology of China, Chengdu 610054, China
Shijie Zhou: The School of Information and Software Enginerring, University of Electronic Science and Technology of China, Chengdu 610054, China
Yang Luo: The School of Information and Software Enginerring, University of Electronic Science and Technology of China, Chengdu 610054, China

DOI: https://doi.org/10.3390/app9061152
Journal volume & issue: Vol. 9, no. 6
p. 1152

Abstract

Read online

In recent years, the single-modal spam filtering systems have had a high detection rate for image spamming or text spamming. To avoid detection based on the single-modal spam filtering systems, spammers inject junk information into the multi-modality part of an email and combine them to reduce the recognition rate of the single-modal spam filtering systems, thereby implementing the purpose of evading detection. In view of this situation, a new model called multi-modal architecture based on model fusion (MMA-MF) is proposed, which use a multi-modal fusion method to ensure it could effectively filter spam whether it is hidden in the text or in the image. The model fuses a Convolutional Neural Network (CNN) model and a Long Short-Term Memory (LSTM) model to filter spam. Using the LSTM model and the CNN model to process the text and image parts of an email separately to obtain two classification probability values, then the two classification probability values are incorporated into a fusion model to identify whether the email is spam or not. For the hyperparameters of the MMA-MF model, we use a grid search optimization method to get the most suitable hyperparameters for it, and employ a k-fold cross-validation method to evaluate the performance of this model. Our experimental results show that this model is superior to the traditional spam filtering systems and can achieve accuracies in the range of 92.64–98.48%.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords