Effective spam filter based on a hybrid method of header checking and content parsing

Ko‐Tsung Chu; Hua‐Ting Hsu; Jyh‐Jian Sheu; Wei‐Pang Yang; Cheng‐Chi Lee

doi:10.1049/iet-net.2019.0191

IET Networks (Nov 2020)

Effective spam filter based on a hybrid method of header checking and content parsing

Ko‐Tsung Chu,
Hua‐Ting Hsu,
Jyh‐Jian Sheu,
Wei‐Pang Yang,
Cheng‐Chi Lee

Affiliations

Ko‐Tsung Chu: Department of FinanceMinghsin University of Science and TechnologyHsinchu30401Taiwan
Hua‐Ting Hsu: Department of Information ManagementNational Dong Hwa UniversityHualien97401Taiwan
Jyh‐Jian Sheu: College of CommunicationNational Chengchi UniversityTaipei11605Taiwan
Wei‐Pang Yang: Department of Information ManagementNational Dong Hwa UniversityHualien97401Taiwan
Cheng‐Chi Lee: Department of Library and Information Science, Research and Development Center for Physical Education, Health, and Information TechnologyFu Jen Catholic UniversityNew Taipei City24205Taiwan

DOI: https://doi.org/10.1049/iet-net.2019.0191
Journal volume & issue: Vol. 9, no. 6
pp. 338 – 347

Abstract

Read online

In recent years, hazardous e‐mails arose, such as the e‐mails infected with ‘viruses’ or ‘worms’ spreading destructive programs and the ‘Phishing Mails’ defrauding e‐mail accounts of the users. The number of spams continue to grow. With the related problems of spam coming to be more severe, the spam topics have become significant in various research domains. The common filtering methods include black/white list, rule learning, and those based on text classification, such as Naïve Bayes, support vector machine, and boosting trees, multi‐agent and genetic algorithm. Among these, the methods based on text classification are most widely applied. Moreover, some efficient methods were proposed to consider only the e‐mail's header section, based on which both operating efficiency and classification efficiency could be improved. By applying machine learning technique and decision tree data mining algorithm C4.5, this study aims to propose an efficient spam filtering method with the following features: (i) proposing a two‐phase filtering mechanism to scan mainly e‐mail's header and auxiliary content. (ii) Reducing the problem of false positive. The experimental results show that the authors’ method has a considerably high accuracy rate of 98.76%. Compared with some other methods of using the same spam data sets or of deep learning‐based, their method obviously has an excellent performance.

Published in IET Networks

ISSN: 2047-4954 (Print); 2047-4962 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Telecommunication
Website: https://ietresearch.onlinelibrary.wiley.com/journal/20474962

About the journal

Abstract

Keywords