The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)

Isaac C. Ferreira; Marcelo V. C. Aragao; Edvard M. Oliveira; Bruno T. Kuehne; Edmilson M. Moreira; Otavio A. S. Carpinteiro

doi:10.1109/ACCESS.2021.3118901

IEEE Access (Jan 2021)

The Development of the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS)

Isaac C. Ferreira,
Marcelo V. C. Aragao,
Edvard M. Oliveira,
Bruno T. Kuehne,
Edmilson M. Moreira,
Otavio A. S. Carpinteiro

Affiliations

Isaac C. Ferreira: TRICOD Equipamentos Eletrônicos Indústria e Comércio LTDA, Itajubá, Brazil
Marcelo V. C. Aragao: ORCiD; National Institute of Telecommunications, Santa Rita do Sapucaí, Brazil
Edvard M. Oliveira: Research Group on Systems and Computer Engineering, Federal University of Itajubá, Itajubá, Brazil
Bruno T. Kuehne: ORCiD; Research Group on Systems and Computer Engineering, Federal University of Itajubá, Itajubá, Brazil
Edmilson M. Moreira: ORCiD; Research Group on Systems and Computer Engineering, Federal University of Itajubá, Itajubá, Brazil
Otavio A. S. Carpinteiro: ORCiD; Research Group on Systems and Computer Engineering, Federal University of Itajubá, Itajubá, Brazil

DOI: https://doi.org/10.1109/ACCESS.2021.3118901
Journal volume & issue: Vol. 9
pp. 138618 – 138632

Abstract

Read online

Spam e-mails are unsolicited e-mails received by users of the e-mail service. Spam e-mails cause serious harm to organizations, for they waste, among other things, their computational and networking resources. To reduce the damage caused by them, organizations use anti-spams. Anti-spams are software systems that classify e-mails in order to separate legitimate from spam e-mails. The best current commercial and open-source anti-spams, and in particular the well-known commercial anti-spam CanIt-PRO, make use of various techniques, such as blacklists and/or SMTP extensions, to classify e-mails. Unfortunately, both blacklists and SMTP extensions have serious drawbacks, such as low scalability and high computational and network costs. This paper introduces the Open Machine-Learning-Based Anti-Spam (Open-MaLBAS). Unlike the best current anti-spams, Open-MaLBAS does not make use of blacklists and SMTP extensions, but only of machine learning models for e-mail classification. Open-MaLBAS was compared to CanIt-PRO in a series of experiments on a database composed of 862,227 real e-mails, collected over three months at the Federal University of Itajubá, Brazil. The e-mails were previously classified by CanIt-PRO. From the experiments, it was observed that Open-MaLBAS was able to correctly classify 81.48% and 98.13% of the e-mails in the database, using, respectively, the two models — Multi-Layer Perceptron and Random Forest — evaluated. In addition, it managed to obtain times of up to 88% shorter than those of CanIt-PRO to classify all e-mails in the database. Open-MaLBAS is implemented in Java language, under free software license, for free use. It is available on GitHub.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords