IEEE Access (Jan 2024)
A Novel Approach for Spam Detection Using Natural Language Processing With AMALS Models
Abstract
To enhance their company operations, organizations within the industry leverage the ecosystem of big data to manage vast volumes of information effectively. To achieve this objective, it is imperative to analyze textual data while prioritizing the safeguarding of data integrity and implementing robust measures for organizing and validating data through the utilization of spam filters. Various methodologies can be employed, including Word2Vec, bag-of-words, BERT, as well as term frequency & reciprocal document frequency (TF-IDF). Nevertheless, none of these solutions effectively address the problem of data scarcity, which might lead to the existence of missing information in the collected documents. To properly address this problem, it is necessary to employ a strategy that categorizes each document based on the topic matter and uses statistical approaches for approximation. This research paper presents a novel approach for spam detection using natural language processing. The proposed strategy utilizes a least-squares model to modify themes and incorporates gradient descent and altering least-squares (i.e., AMALS) models for estimating missing data. TF-IDF and uniform-distribution methods perform the estimation. The performance evaluation reveals that the suggested technique exhibits a superior performance of 98% compared to the existing industry TF-IDF model in accurately predicting spam within big data ecosystems. By this model, the environment of an organization or a company can be saved from spamming or other attacks, which can lead to extracting their data for unauthorized users to protect the details.
Keywords