IEEE Access (Jan 2021)

Detection of Breast Cancer Through Clinical Data Using Supervised and Unsupervised Feature Selection Techniques

  • Amin Ul Haq,
  • Jian Ping Li,
  • Abdus Saboor,
  • Jalaluddin Khan,
  • Samad Wali,
  • Sultan Ahmad,
  • Amjad Ali,
  • Ghufran Ahmad Khan,
  • Wang Zhou

DOI
https://doi.org/10.1109/ACCESS.2021.3055806
Journal volume & issue
Vol. 9
pp. 22090 – 22105

Abstract

Read online

Breast cancer is one the most critical disease and suffered many people around the world. The efficient and correct detection of breast cancer is still needed to ensure this medical issue although the researchers around the world are proposed different diagnostic methods for detection of this disease, however these existing methods still needed further improvement to correct and efficient detection of this disease. In this study, we proposed a new breast cancer identification method by using machine learning algorithms and clinical data. In the proposed method supervised (Relief algorithm) and unsupervised (Autoencoder, PCA algorithms) techniques have been used for related features selection from data set and then these selected features have been used for training and testing of classifier support vector machine for accurate and on time detection of breast cancer. Additionally, in the proposed approach k fold cross validation method has been used for model validation and best hyperparameters selection. The model performance evaluation metrics have been used for model performance evaluation. The BC data sets have been used for testing of the proposed method. The analysis of experimental results has been demonstrated that the features selected by Relief algorithm are more related for accurate detection of Breast cancer instead of features selected by Auotencoder and PCA algorithms. The proposed method has been attained high results in terms of accuracy on selected feature selected by Relief algorithm and achieved 99.91% accuracy. We have been employed McNemar's statistical test for performance comparison of our different models. Further, the proposed method performance has been compared with baseline methods in the literature and the proposed method performance is high as compared to base line methods. Due to the high performance of the proposed method (Relief-Support vector machine) we highly recommended it for the diagnosis of breast cancer. In addition, the proposed method can be easily incorporated into the healthcare system for reliable diagnosis of Breast cancer.

Keywords