Frontiers in Health Informatics (Mar 2023)
Improvement of the Performance of Machine Learning Algorithms in Predicting Breast Cancer
Abstract
Introduction: Breast cancer is one of the most common cancers among women compared to all other ones. Machine learning techniques can bring a large contribute on the process of prediction and early diagnosis of breast cancer, became a research hotspot and has been proved as a strong technique. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for tumor classification prediction. Materials & Methods: Several supervised machine learning algorithms were utilized to diagnosis and prediction of cancer tumor such as Logistic Regression Decision Tree, Random Forest and KNN. The algorithms are applied to a dataset taken from the UCI repository including 699 samples. The dataset includes Breast cancer features. To enhance the algorithms’ performance, these features are analyzed, the feature importance score and cross validation are considered. In this paper ML algorithms improved coupled by limited and selective features to produce high classification accuracy in tumor classification. Results: As a result of evaluation, Logistic Regression algorithm with accuracy value equal to 99.14%, AUC ROC equal to 99.6%, Extra Tree algorithm with accuracy value equal to 99.14% and AUC ROC equal to 99.1% have better performance than other algorithms. therefore, these techniques can be useful for diagnosis and prediction of cancer tumor and prescribe it correctly. Conclusions: The technique of machine learning can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate breast cancer and indeed, the diagnosis and prediction of breast cancer is compared to determine the most appropriate classifier.
Keywords