Heliyon (Oct 2024)
SmartScanPCOS: A feature-driven approach to cutting-edge prediction of Polycystic Ovary Syndrome using Machine Learning and Explainable Artificial Intelligence
Abstract
PolyCystic Ovarian Syndrome (PCOS) poses significant challenges to women's reproductive health due to its diagnostic complexity arising from a variety of symptoms, including hirsutism, anovulation, pain, obesity, hyperandrogenism, and oligomenorrhea, necessitating multiple clinical tests. Leveraging Artificial Intelligence (AI) in healthcare offers several benefits that can significantly impact patient care, streamline operations, and improve medical outcomes overall. This study presents an Explainable Artificial Intelligence (XAI)-driven PCOS smart predictor, structured as a hierarchical ensemble consisting of two tiers of Random Forest classifiers following extensive analysis of seven conventional classifiers and two additional stacking ensemble classifiers. An open-source data set comprising numerical parametric features linked to PCOS for classifier training was used. Moreover, to identify essential features for PCOS prediction three feature selection methods: Threshold-driven Optimized Principal Component Analysis (TOPCA), Optimized Salp Swarm (OSSM), and Threshold-driven Optimized Mutual Information Method (TOMIM) were fine-tuned through thresholding and improvisation to detect diverse attribute sets with varying numbers and combinations. Notably, the two-level Random Forest classifier model outperformed others with a remarkable 99.31 % accuracy by employing the top 17 features selected through the Threshold-driven Optimized Mutual Information Method (TOMIM) along with anoverallaccuracy of 99.32 % with 8 fold cross validation for 25 runs. The Smart predictor, constructed using Shapash - a Python library for Explainable Artificial Intelligence - was utilized to deploy the two-level Random Forest classifier model. Ensuring transparency and result reliability, visualizations from robust Explainable AI libraries were employed at different prediction stages for all considered classifiers in this study.