The Journal of Engineering (Dec 2019)

Deep learning-based research on the influence of training data size for breast cancer pathology detection

  • Chongyang Cui,
  • Shangchun Fan,
  • Han Lei,
  • Xiaolei Qu,
  • Dezhi Zheng,

DOI
https://doi.org/10.1049/joe.2018.9093

Abstract

Read online

In pathological diagnosis of breast cancer, there are problems such as shortage of pathologists, difficulties in sample labeling, and huge workload of manual diagnosis. Therefore, deep learning-based computer-assisted pathology analysis systems have been developed to diagnose breast cancer and have achieved impressive results. However, it is difficult to obtain a large number of training sets due to the scarcity of pathological images and the huge labeling costs. Therefore, the size of the training set should be planned before building the pathology computer-assisted breast cancer analysis system. Here, the authors present a study to determine the optimal size of the training data set needed to achieve high classification accuracy when developing a pathology computer-assisted breast cancer analysis system. The authors trained two kind of CNNs using six different sizes of training data set and then tested the resulting system with a total of 10,000 images. All images were acquired from the Camelyon17 challenge. Here, the authors propose a scheme for determining the size of the training set and the size of the model in developing the pathology computer-assisted breast cancer analysis systems, which can be easily applied to develop systems for other different pathological images.

Keywords