Радіоелектронні і комп'ютерні системи (Dec 2023)

Breast tumor prediction and feature importance score finding using machine learning algorithms

  • Sk. Shalauddin Kabir,
  • Md. Sabbir Ahmmed,
  • Md. Moradul Siddique,
  • Romana Rahman Ema,
  • Motiur Rahman,
  • Syed Md. Galib

DOI
https://doi.org/10.32620/reks.2023.4.03
Journal volume & issue
Vol. 0, no. 4
pp. 32 – 42

Abstract

Read online

The subject matter of this study is breast tumor prediction and feature importance score finding using machine learning algorithms. The goal of this study was to develop an accurate predictive model for identifying breast tumors and determining the importance of various features in the prediction process. The tasks undertaken included collecting and preprocessing the Wisconsin Breast Cancer original dataset (WBCD). Dividing the dataset into training and testing sets, training using machine learning algorithms such as Random Forest, Decision Tree (DT), Logistic Regression, Multi-Layer Perceptron, Gradient Boosting Classifier, Gradient Boosting Classifier (GBC), and K-Nearest Neighbors, evaluating the models using performance metrics, and calculating feature importance scores. The methods used involve data collection, preprocessing, model training, and evaluation. The outcomes showed that the Random Forest model is the most reliable predictor with 98.56 % accuracy. A total of 699 instances were found, and 461 instances were reached using data optimization methods. In addition, we ranked the top features from the dataset by feature importance scores to determine how they affect the classification models. Furthermore, it was subjected to a 10-fold cross-validation process for performance analysis and comparison. The conclusions drawn from this study highlight the effectiveness of machine learning algorithms in breast tumor prediction, achieving high accuracy and robust performance metrics. In addition, the analysis of feature importance scores provides valuable insights into the key indicators of breast cancer development. These findings contribute to the field of breast cancer diagnosis and prediction by enhancing early detection and personalized treatment strategies and improving patient outcomes.

Keywords