Jurnal Informatika (May 2024)
Optimization of Hyperparameter K in K-Nearest Neighbor Using Particle Swarm Optimization
Abstract
This study aims to enhance the performance of the K-Nearest Neighbors (KNN) algorithm by optimizing the hyperparameter K using the Particle Swarm Optimization (PSO) algorithm. In contrast to prior research, which typically focuses on a single dataset, this study seeks to demonstrate that PSO can effectively optimize KNN hyperparameters across diverse datasets. Three datasets from different domains are utilized: Iris, Wine, and Breast Cancer, each featuring distinct classification types and classes. Furthermore, this research endeavors to establish that PSO can operate optimally with both Manhattan and Euclidean distance metrics. Prior to optimization, experiments with default K values (3, 5, and 7) were conducted to observe KNN behavior on each dataset. Initial results reveal stable accuracy in the iris dataset, while the wine and breast cancer datasets exhibit a decrease in accuracy at K=3, attributed to attribute complexity. The hyperparameter K optimization process with PSO yields a significant increase in accuracy, particularly in the wine dataset, where accuracy improves by 6.28% with the Manhattan matrix. The enhanced accuracy in the optimized KNN algorithm demonstrates the effectiveness of PSO in overcoming KNN constraints. Although the accuracy increase for the iris dataset is not as pronounced, this research provides insight that optimizing the hyperparameter K can yield positive results, even for datasets with initially good performance. A recommendation for future research is to conduct similar experiments with different algorithms, such as Support Vector Machine or Random Forest, to further evaluate PSO's ability to optimize the iris, wine, and breast cancer datasets.
Keywords