Data Science and Management (Mar 2023)
Application of support vector machine algorithm for early differential diagnosis of prostate cancer
Abstract
Prostate cancer (PCa) symptoms are commonly confused with benign prostate hyperplasia (BPH), particularly in the early stages due to similarities between symptoms, and in some instances, underdiagnoses. Clinical methods have been utilized to diagnose PCa; however, at the full-blown stage, clinical methods usually present high risks of complicated side effects. Therefore, we proposed the use of support vector machine for early differential diagnosis of PCa (SVM-PCa-EDD). SVM was used to classify persons with and without PCa. We used the PCa dataset from the Kaggle Healthcare repository to develop and validate SVM model for classification. The PCa dataset consisted of 250 features and one class of features. Attributes considered in this study were age, body mass index (BMI), race, family history, obesity, trouble urinating, urine stream force, blood in semen, bone pain, and erectile dysfunction. The SVM-PCa-EDD was used for preprocessing the PCa dataset, specifically dealing with class imbalance, and for dimensionality reduction. After eliminating class imbalance, the area under the receiver operating characteristic (ROC) curve (AUC) of the logistic regression (LR) model trained with the downsampled dataset was 58.4%, whereas that of the AUC-ROC of LR trained with the class imbalance dataset was 54.3%. The SVM-PCa-EDD achieved 90% accuracy, 80% sensitivity, and 80% specificity. The validation of SVM-PCa-EDD using random forest and LR showed that SVM-PCa-EDD performed better in early differential diagnosis of PCa. The proposed model can assist medical experts in early diagnosis of PCa, particularly in resource-constrained healthcare settings and making further recommendations for PCa testing and treatment.