Heliyon (Oct 2024)
Machine-learning diagnostic models for ovarian tumors
Abstract
Purpose: To create a diagnostic framework for clinical behavior and pathological tissue prognosis in ovarian cancer by using machine-learning (ML) methods based on multiple biomarkers. Experimental design: Overall, 713 patients with ovarian tumors at Sun Yat Sen Memorial Hospital were randomized into training and test cohorts. Four supervised ML classifiers, namely Support Vector Machine, Random Forest, k-nearest neighbor, and logistic regression were used to derive diagnostic and prognostic information from 10 parameters commonly available from pretreatment peripheral blood tests and age. The best prediction model was selected and validated by comparing the accuracy and the area under the ROC curve of each prediction model and by applying the external data of Guangdong Maternal and Child Health Center. Results: ML techniques were superior to conventional regression-based analyses in predicting multiple clinical parameters pertaining to ovarian tumor. Ensemble methods combining weak decision trees and RF showed the best reference in diagnosis, especially for malignant ovarian cancer. The values for the highest accuracy and area under the ROC curve for malignant ovarian cancer from benign or borderline ovarian tumors with RF were 99.82 % and 0.86 (micro-average ROC curve), respectively. The greatest accuracy and AUC for the diagnosis of pathological tissue with logistic regression curve were 78.0 % and 0.95 (micro-average ROC curve), respectively. In external validation, the random forest prediction model had an accuracy of 0.789 for applying data from external centers to verify tumor benignity and malignancy, and the logistic regression model had an accuracy of 0.719 for predicting the nature of the tumor. Conclusions: An ovarian tumor can be diagnosed and characterized before initial treatment via ML systems to provide critical diagnostic and prognostic information. The use of predictive algorithms can facilitate customized treatment options with patient preprocessing stratification.