Healthcare Analytics (Nov 2022)
A clinical decision support system for polycystic ovarian syndrome using red deer algorithm and random forest classifier
Abstract
This study develops a clinical decision support system to assist physicians with monitoring Polycystic Ovarian Syndrome (PCOS). The proposed method uses a classification model with optimization techniques and machine learning algorithms. The existence of irrelevant features can deteriorate the performance of any classifiers. This work analyses the impact of such features on classification accuracy and automatically identifies and removes the irrelevant features from the dataset. A wrapper approach is employed for feature selection. It uses Red Deer Algorithm (RDA) to find the optimal features and a random forest (RF) classifier to evaluate them. RDA is an optimization algorithm inspired by red deer’s mating and roaring behaviour. Experiments were carried out on the PCOS dataset, which is available in the Kaggle dataset repository. The optimal dataset is trained and tested using an RF classifier with 50 estimators. RDA’s improved exploration and exploitation capabilities are the rationales behind using it as the search method. The novelty of the proposed method lies in the novel fitness function used, which considers both accuracy of the classification and the optimized number of features. The proposed CDSS displays accuracy of 89.81%. The results of the proposed work were compared with the performance of competing methods in the literature. Logistic Regression classifier, k-Nearest Neighbour (k-NN), Decision tree, Naïve Bayes, Support Vector Machine (SVM) classifier, RF-PSO, RF-Ant Colony Optimization, and RF-Genetic Algorithm and are found to be superior in terms of accuracy, sensitivity, and specificity. The statistical significance of the suggested feature selection approach is confirmed using McNamer’s test.