IET Networks (Nov 2020)

Effective spam filter based on a hybrid method of header checking and content parsing

  • Ko‐Tsung Chu,
  • Hua‐Ting Hsu,
  • Jyh‐Jian Sheu,
  • Wei‐Pang Yang,
  • Cheng‐Chi Lee

DOI
https://doi.org/10.1049/iet-net.2019.0191
Journal volume & issue
Vol. 9, no. 6
pp. 338 – 347

Abstract

Read online

In recent years, hazardous e‐mails arose, such as the e‐mails infected with ‘viruses’ or ‘worms’ spreading destructive programs and the ‘Phishing Mails’ defrauding e‐mail accounts of the users. The number of spams continue to grow. With the related problems of spam coming to be more severe, the spam topics have become significant in various research domains. The common filtering methods include black/white list, rule learning, and those based on text classification, such as Naïve Bayes, support vector machine, and boosting trees, multi‐agent and genetic algorithm. Among these, the methods based on text classification are most widely applied. Moreover, some efficient methods were proposed to consider only the e‐mail's header section, based on which both operating efficiency and classification efficiency could be improved. By applying machine learning technique and decision tree data mining algorithm C4.5, this study aims to propose an efficient spam filtering method with the following features: (i) proposing a two‐phase filtering mechanism to scan mainly e‐mail's header and auxiliary content. (ii) Reducing the problem of false positive. The experimental results show that the authors’ method has a considerably high accuracy rate of 98.76%. Compared with some other methods of using the same spam data sets or of deep learning‐based, their method obviously has an excellent performance.

Keywords