Heliyon (Aug 2024)
Predicting TCM patterns in PCOS patients: An exploration of feature selection methods and multi-label machine learning models
Abstract
Background: Traditional Chinese Medicine (TCM) offers individualized treatment for Polycystic Ovary Syndrome (PCOS) through pattern differentiation, but the subjectivity of TCM diagnoses can lead to inconsistent outcomes. Integrating machine learning (ML) offers an objective basis to support TCM diagnoses. This study aims to evaluate various feature selection techniques and multi-label ML algorithms to develop an effective predictive model for classifying TCM patterns in PCOS patients, thereby enhancing diagnostic standardization and treatment personalization. Methods: The study utilized a dataset comprising 432 patients with PCOS, exhibiting one or more of five TCM patterns. Feature selection began with Variance Thresholding (VT), followed by a comparison of five advanced techniques: Statistical Analysis Test, Recursive Feature Elimination with Cross-Validation (RFECV), Least Absolute Shrinkage and Selection Operator Regression, BorutaShap, and ReliefF. To ascertain the most effective model for predicting PCOS TCM patterns, four ML algorithms—Support Vector Machine, Logistic Regression, Extreme Gradient Boosting (XGBoost), and Artificial Neural Networks—were evaluated against the identified feature set. Results: VT reduced the feature count from 224 to 174. RFECV emerged as the most effective feature selection method, identifying 67 key features. XGBoost emerged as the top-performing model, demonstrating superior testing accuracy (0.7870), F1 score (0.9519), and Hamming loss (0.0481) with RFECV-optimized features. Conclusions: The RFECV-XGBoost model proved effective for classifying TCM patterns in PCOS. It emphasizes the necessity of precise feature selection and the significant capabilities of ML in advancing TCM pattern diagnostics, marking a significant step toward enhancing precise and personalized healthcare in biomedical studies.