Classifying Swahili Smishing Attacks for Mobile Money Users: A Machine-Learning Approach

Iddi S. Mambina; Jema D. Ndibwile; Kisangiri F. Michael

doi:10.1109/ACCESS.2022.3196464

IEEE Access (Jan 2022)

Classifying Swahili Smishing Attacks for Mobile Money Users: A Machine-Learning Approach

Iddi S. Mambina,
Jema D. Ndibwile,
Kisangiri F. Michael

Affiliations

Iddi S. Mambina: ORCiD; School of Computation and Communication Science and Engineering, The Nelson Mandela Institution of Science and Technology, Arusha, Tanzania
Jema D. Ndibwile: ORCiD; College of Engineering, Carnegie Mellon University Africa, Kigali, Rwanda
Kisangiri F. Michael: School of Computation and Communication Science and Engineering, The Nelson Mandela Institution of Science and Technology, Arusha, Tanzania

DOI: https://doi.org/10.1109/ACCESS.2022.3196464
Journal volume & issue: Vol. 10
pp. 83061 – 83074

Abstract

Read online

Due to the massive adoption of mobile money in Sub-Saharan countries, the global transaction value of mobile money exceeded $\$ $ 2 billion in 2021. Projections show transaction values will exceed $\$ $ 3 billion by the end of 2022, and Sub-Saharan Africa contributes half of the daily transactions. SMS (Short Message Service) phishing cost corporations and individuals millions of dollars annually. Spammers use Smishing (SMS Phishing) messages to trick a mobile money user into sending electronic cash to an unintended mobile wallet. Though Smishing is an incarnation of phishing, they differ in the information available and attack strategy. As a result, detecting Smishing becomes difficult. Numerous models and techniques to detect Smishing attacks have been introduced for high-resource languages, yet few target low-resource languages such as Swahili. This study proposes a machine-learning based model to classify Swahili Smishing text messages targeting mobile money users. Experimental results show a hybrid model of Extratree classifier feature selection and Random Forest using TFIDF (Term Frequency Inverse Document Frequency) vectorization yields the best model with an accuracy score of 99.86%. Results are measured against a baseline Multinomial Naïve-Bayes model. In addition, comparison with a set of other classic classifiers is also done. The model returns the lowest false positive and false negative of 2 and 4, respectively, with a Log-Loss of 0.04. A Swahili dataset with 32259 messages is used for performance evaluation.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords