A Novel Approach for Spam Detection Using Natural Language Processing With AMALS Models

Ruchi Agarwal; Anshita Dhoot; Surya Kant; Vimal Singh Bisht; Hasmat Malik; Md. Fahim Ansari; Asyraf Afthanorhan; Mohammad Asef Hossaini

doi:10.1109/ACCESS.2024.3391023

IEEE Access (Jan 2024)

A Novel Approach for Spam Detection Using Natural Language Processing With AMALS Models

Ruchi Agarwal,
Anshita Dhoot,
Surya Kant,
Vimal Singh Bisht,
Hasmat Malik,
Md. Fahim Ansari,
Asyraf Afthanorhan,
Mohammad Asef Hossaini

Affiliations

Ruchi Agarwal: Department of Computer Applications, JIMS Engineering Management Technical Campus, Greater Noida, India
Anshita Dhoot: Department Phystech, School of Radio Engineering and Computer Technology, Moscow Institute of Physics and Technology, Moscow, Russia
Surya Kant: Department of Electronics and Communication Engineering, Graphic Era Hill University, Bhimtal, India
Vimal Singh Bisht: Department of Electronics and Communication Engineering, Graphic Era Hill University, Bhimtal, India
Hasmat Malik: ORCiD; Department of Electrical Power Engineering, Faculty of Electrical Engineering, Universiti Technologi Malaysia (UTM), Johor Bahru, Malaysia
Md. Fahim Ansari: ORCiD; Department of Electrical Engineering, Graphic Era (Deemed to be University), Dehradun, India
Asyraf Afthanorhan: ORCiD; Artificial Intelligence for Islamic Civilization and Sustainability, Universiti Sultan Zainal Abidin (UniSZA), Kuala Nerus, Terengganu, Malaysia
Mohammad Asef Hossaini: ORCiD; Department of Physics, Badghis University, Bala Murghab, Badghis, Afghanistan

DOI: https://doi.org/10.1109/ACCESS.2024.3391023
Journal volume & issue: Vol. 12
pp. 124298 – 124313

Abstract

Read online

To enhance their company operations, organizations within the industry leverage the ecosystem of big data to manage vast volumes of information effectively. To achieve this objective, it is imperative to analyze textual data while prioritizing the safeguarding of data integrity and implementing robust measures for organizing and validating data through the utilization of spam filters. Various methodologies can be employed, including Word2Vec, bag-of-words, BERT, as well as term frequency & reciprocal document frequency (TF-IDF). Nevertheless, none of these solutions effectively address the problem of data scarcity, which might lead to the existence of missing information in the collected documents. To properly address this problem, it is necessary to employ a strategy that categorizes each document based on the topic matter and uses statistical approaches for approximation. This research paper presents a novel approach for spam detection using natural language processing. The proposed strategy utilizes a least-squares model to modify themes and incorporates gradient descent and altering least-squares (i.e., AMALS) models for estimating missing data. TF-IDF and uniform-distribution methods perform the estimation. The performance evaluation reveals that the suggested technique exhibits a superior performance of 98% compared to the existing industry TF-IDF model in accurately predicting spam within big data ecosystems. By this model, the environment of an organization or a company can be saved from spamming or other attacks, which can lead to extracting their data for unauthorized users to protect the details.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords