IEEE Access (Jan 2024)

Uncovering SMS Spam in Swahili Text Using Deep Learning Approaches

  • Iddi S. Mambina,
  • Jema D. Ndibwile,
  • Deo Uwimpuhwe,
  • Kisangiri F. Michael

DOI
https://doi.org/10.1109/ACCESS.2024.3365193
Journal volume & issue
Vol. 12
pp. 25164 – 25175

Abstract

Read online

In today’s communications, Short Message Service (SMS) and Internet protocol-based messaging systems are the most widely used channels. These services are currently the target of an unprecedented number of threats due to their rising appeal. Some spammers have more nefarious motives, even though most spam stems from businesses seeking to promote their products. In recent years, as new security threats such as Smishing have emerged, the amount of SMS spam has experienced substantial growth. The scientific community has paid less attention to SMS spam than email spam even though both are extensively utilized. SMS spam presents extra processing hurdles. These include the use of lexical variants, SMS-like contractions, or sophisticated obfuscations, which degrade the performance of conventional filtering solutions. In this study, we investigate the efficacy of deep-learning models for filtering Swahili SMS spam based on linguistics and behavioral patterns using a real-world dataset from telecommunications companies in Tanzania. To validate the effectiveness of our models we subjected them to the UCI dataset of spam messages written in English. The models were trained and tested with 10 k-fold cross-validation. The experimental results show that the CNN-LSTM-LSTM hybrid model attained the highest accuracy of 99.98 on the Swahili dataset while CNN-BiLSTM performed better on the UCI dataset with an accuracy of 98.38.

Keywords