PLoS ONE (Jan 2021)
BCDnet: Parallel heterogeneous eight-class classification model of breast pathology.
Abstract
Breast cancer is the cancer with the highest incidence of malignant tumors in women, which seriously endangers women's health. With the help of computer vision technology, it has important application value to automatically classify pathological tissue images to assist doctors in rapid and accurate diagnosis. Breast pathological tissue images have complex and diverse characteristics, and the medical data set of breast pathological tissue images is small, which makes it difficult to automatically classify breast pathological tissues. In recent years, most of the researches have focused on the simple binary classification of benign and malignant, which cannot meet the actual needs for classification of pathological tissues. Therefore, based on deep convolutional neural network, model ensembleing, transfer learning, feature fusion technology, this paper designs an eight-class classification breast pathology diagnosis model BCDnet. A user inputs the patient's breast pathological tissue image, and the model can automatically determine what the disease is (Adenosis, Fibroadenoma, Tubular Adenoma, Phyllodes Tumor, Ductal Carcinoma, Lobular Carcinoma, Mucinous Carcinoma or Papillary Carcinoma). The model uses the VGG16 convolution base and Resnet50 convolution base as the parallel convolution base of the model. Two convolutional bases (VGG16 convolutional base and Resnet50 convolutional base) obtain breast tissue image features from different fields of view. After the information output by the fully connected layer of the two convolutional bases is fused, it is classified and output by the SoftMax function. The model experiment uses the publicly available BreaKHis data set. The number of samples of each class in the data set is extremely unevenly distributed. Compared with the binary classification, the number of samples in each class of the eight-class classification is also smaller. Therefore, the image segmentation method is used to expand the data set and the non-repeated random cropping method is used to balance the data set. Based on the balanced data set and the unbalanced data set, the BCDnet model, the pre-trained model Resnet50+ fine-tuning, and the pre-trained model VGG16+ fine-tuning are used for multiple comparison experiments. In the comparison experiment, the BCDnet model performed outstandingly, and the correct recognition rate of the eight-class classification model is higher than 98%. The results show that the model proposed in this paper and the method of improving the data set are reasonable and effective.