Machine Learning with Applications (Dec 2021)
An investigation of XGBoost-based algorithm for breast cancer classification
Abstract
Breast cancer is one of the leading cancers affecting women around the world. The Computer-Aided Diagnosis (CAD) system is a powerful tool to assist pathologists during the process of diagnosing cancer, which effectively identifies the presence of cancerous cells. A standard CAD system includes processes of pre-processing, feature extraction, feature selection and classification. In this paper, we propose an enhanced breast cancer classification technique called Deep Learning and eXtreme Gradient Boosting (DLXGB) on histopathology breast cancer images using the BreaKHis dataset. This method first applies data augmentation and stain normalization for pre-processing, then pre-trained DenseNet201 will automatically learn features within an image and combine with a powerful gradient boosting classifier. The proposed classification technique is designed to classify breast cancer histology images into binary benign and malignant, and additionally one of eight non-overlapping/overlapping categories: i.e., Adenosis (A), Fibroadenoma (F), Phyllodes Tumour (PT), And Tubular Adenoma (TA) Ductal Carcinoma (DC), Lobular Carcinoma (LC), Mucinous Carcinoma (MC), And Papillary Carcinoma (PC). With DLXGB, we have obtained an accuracy of 97% for both binary and multi-classification improving the exiting work done by researchers using the BreaKHis dataset. The results indicated that this method could produce a powerful prediction for breast cancer image classification.
Keywords