Computers (Sep 2024)
Optimized Machine Learning Classifiers for Symptom-Based Disease Screening
Abstract
This work presents a disease detection classifier based on symptoms encoded by their severity. This model is presented as part of the solution to the saturation of the healthcare system, aiding in the initial screening stage. An open-source dataset is used, which undergoes pre-processing and serves as the data source to train and test various machine learning models, including SVM, RFs, KNN, and ANNs. A three-phase optimization process is developed to obtain the best classifier: first, the dataset is pre-processed; secondly, a grid search is performed with several hyperparameter variations to each classifier; and, finally, the best models obtained are subjected to additional filtering processes. The best-results model, selected based on the performance and the execution time, is a KNN with 2 neighbors, which achieves an accuracy and F1 score of over 98%. These results demonstrate the effectiveness and improvement of the evaluated models compared to previous studies, particularly in terms of accuracy. Although the ANN model has a longer execution time compared to KNN, it is retained in this work due to its potential to handle more complex datasets in a real clinical context.
Keywords