AI-based disease category prediction model using symptoms from low-resource Ethiopian language: Afaan Oromo text

Etana Fikadu Dinsa; Mrinal Das; Teklu Urgessa Abebe

doi:10.1038/s41598-024-62278-7

Scientific Reports (May 2024)

AI-based disease category prediction model using symptoms from low-resource Ethiopian language: Afaan Oromo text

Etana Fikadu Dinsa,
Mrinal Das,
Teklu Urgessa Abebe

Affiliations

Etana Fikadu Dinsa: Department of Computer Science and Engineering, Engineering and Technology, Wollega University
Mrinal Das: Department of Data Science, Indian Institute of Technology Palakkad (IIT Palakkad)
Teklu Urgessa Abebe: Department of Computer Science and Engineering, Adama Science and Technology University

DOI: https://doi.org/10.1038/s41598-024-62278-7
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Automated disease diagnosis and prediction, powered by AI, play a crucial role in enabling medical professionals to deliver effective care to patients. While such predictive tools have been extensively explored in resource-rich languages like English, this manuscript focuses on predicting disease categories automatically from symptoms documented in the Afaan Oromo language, employing various classification algorithms. This study encompasses machine learning techniques such as support vector machines, random forests, logistic regression, and Naïve Bayes, as well as deep learning approaches including LSTM, GRU, and Bi-LSTM. Due to the unavailability of a standard corpus, we prepared three data sets with different numbers of patient symptoms arranged into 10 categories. The two feature representations, TF-IDF and word embedding, were employed. The performance of the proposed methodology has been evaluated using accuracy, recall, precision, and F1 score. The experimental results show that, among machine learning models, the SVM model using TF-IDF had the highest accuracy and F1 score of 94.7%, while the LSTM model using word2vec embedding showed an accuracy rate of 95.7% and F1 score of 96.0% from deep learning models. To enhance the optimal performance of each model, several hyper-parameter tuning settings were used. This study shows that the LSTM model verifies to be the best of all the other models over the entire dataset.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords