Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) (Aug 2024)
Increasing the Accuracy of Brain Stroke Classification using Random Forest Algorithm with Mutual Information Feature Selection
Abstract
Brain stroke stands out as a leading cause of death, distinguishing it from common illnesses and highlighting the critical need to utilize machine learning techniques to identify symptoms. Among these techniques, the Random Forest (RF) algorithm emerged as the main candidate because of its optimal accuracy values. RF was chosen for its ensemble learning properties that optimize accuracy while simultaneously, bagging all outputs (DT), thus increasing its efficacy. Feature Selection, an important data analysis step, which is mainly achieved through pre-processing, aims to identify influential features and ignore less impactful features. Mutual Information serves as an important feature selection method. Specifically, the highest level of accuracy was achieved by cross-validating the test data - 10, resulting in 0.7760 without feature selection and 0.7790 with mutual information. Most of the attributes in the brain stroke dataset show relevance to the stroke disease class, but the resulting decision tree shows age as a particularly important node. So, the research results show that the selection feature (Mutual Information) can increase the accuracy of brain stroke classification, although it is not significant, namely an increase of 0.0030%. With an increase, where there is no significant difference, it can be said that almost all the attributes contained in the brain stroke dataset used have an influence on their relevance to the stroke disease class.
Keywords