Application of support vector machine algorithm for early differential diagnosis of prostate cancer

Boluwaji A. Akinnuwesi; Kehinde A. Olayanju; Benjamin S. Aribisala; Stephen G. Fashoto; Elliot Mbunge; Moses Okpeku; Patrick Owate

Data Science and Management (Mar 2023)

Application of support vector machine algorithm for early differential diagnosis of prostate cancer

Boluwaji A. Akinnuwesi,
Kehinde A. Olayanju,
Benjamin S. Aribisala,
Stephen G. Fashoto,
Elliot Mbunge,
Moses Okpeku,
Patrick Owate

Affiliations

Boluwaji A. Akinnuwesi: Department of Computer Science, Faculty of Science and Engineering, University of Eswatini, Kwaluseni, M201, Swaziland; Corresponding author.
Kehinde A. Olayanju: Department of Computer Science Education, Federal College of Education (Technology), Akoka, Lagos State, 100213, Nigeria
Benjamin S. Aribisala: Department of Computer Science, Faculty of Science, Lagos State University, Ojo, Lagos State, 102101, Nigeria
Stephen G. Fashoto: Department of Computer Science, Faculty of Science and Engineering, University of Eswatini, Kwaluseni, M201, Swaziland
Elliot Mbunge: Department of Computer Science, Faculty of Science and Engineering, University of Eswatini, Kwaluseni, M201, Swaziland
Moses Okpeku: Department of Genetics, University of KwaZulu-Natal, Durban, 4041, South Africa
Patrick Owate: Department of Computer Science, Faculty of Science, Lagos State University, Ojo, Lagos State, 102101, Nigeria

Journal volume & issue: Vol. 6, no. 1
pp. 1 – 12

Abstract

Read online

Prostate cancer (PCa) symptoms are commonly confused with benign prostate hyperplasia (BPH), particularly in the early stages due to similarities between symptoms, and in some instances, underdiagnoses. Clinical methods have been utilized to diagnose PCa; however, at the full-blown stage, clinical methods usually present high risks of complicated side effects. Therefore, we proposed the use of support vector machine for early differential diagnosis of PCa (SVM-PCa-EDD). SVM was used to classify persons with and without PCa. We used the PCa dataset from the Kaggle Healthcare repository to develop and validate SVM model for classification. The PCa dataset consisted of 250 features and one class of features. Attributes considered in this study were age, body mass index (BMI), race, family history, obesity, trouble urinating, urine stream force, blood in semen, bone pain, and erectile dysfunction. The SVM-PCa-EDD was used for preprocessing the PCa dataset, specifically dealing with class imbalance, and for dimensionality reduction. After eliminating class imbalance, the area under the receiver operating characteristic (ROC) curve (AUC) of the logistic regression (LR) model trained with the downsampled dataset was 58.4%, whereas that of the AUC-ROC of LR trained with the class imbalance dataset was 54.3%. The SVM-PCa-EDD achieved 90% accuracy, 80% sensitivity, and 80% specificity. The validation of SVM-PCa-EDD using random forest and LR showed that SVM-PCa-EDD performed better in early differential diagnosis of PCa. The proposed model can assist medical experts in early diagnosis of PCa, particularly in resource-constrained healthcare settings and making further recommendations for PCa testing and treatment.

Published in Data Science and Management

ISSN: 2666-7649 (Online)
Publisher: KeAi Communications Co. Ltd.
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.keaipublishing.com/en/journals/data-science-and-management/

About the journal

Abstract

Keywords