Frontiers in Medicine (May 2020)
Effects of Label Noise on Deep Learning-Based Skin Cancer Classification
- Achim Hekler,
- Jakob N. Kather,
- Jakob N. Kather,
- Eva Krieghoff-Henning,
- Jochen S. Utikal,
- Jochen S. Utikal,
- Friedegund Meier,
- Friedegund Meier,
- Frank F. Gellrich,
- Frank F. Gellrich,
- Julius Upmeier zu Belzen,
- Lars French,
- Justin G. Schlager,
- Kamran Ghoreschi,
- Tabea Wilhelm,
- Heinz Kutzner,
- Carola Berking,
- Markus V. Heppt,
- Sebastian Haferkamp,
- Wiebke Sondermann,
- Dirk Schadendorf,
- Bastian Schilling,
- Benjamin Izar,
- Roman Maron,
- Max Schmitt,
- Stefan Fröhling,
- Stefan Fröhling,
- Daniel B. Lipka,
- Daniel B. Lipka,
- Daniel B. Lipka,
- Titus J. Brinker
Affiliations
- Achim Hekler
- National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany
- Jakob N. Kather
- National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany
- Jakob N. Kather
- Department of Medicine III, RWTH University Hospital Aachen, Aachen, Germany
- Eva Krieghoff-Henning
- National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany
- Jochen S. Utikal
- Department of Dermatology, Heidelberg University, Mannheim, Germany
- Jochen S. Utikal
- Skin Cancer Unit, German Cancer Research Center, Heidelberg, Germany
- Friedegund Meier
- Skin Cancer Center at the University Cancer Centre and National Center for Tumor Diseases Dresden, Dresden, Germany
- Friedegund Meier
- Department of Dermatology, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- Frank F. Gellrich
- Skin Cancer Center at the University Cancer Centre and National Center for Tumor Diseases Dresden, Dresden, Germany
- Frank F. Gellrich
- Department of Dermatology, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- Julius Upmeier zu Belzen
- Berlin Institute of Health (BIH), Charité, Berlin, Germany
- Lars French
- Department of Dermatology and Allergology, Ludwig Maximilian University of Munich, Munich, Germany
- Justin G. Schlager
- Department of Dermatology and Allergology, Ludwig Maximilian University of Munich, Munich, Germany
- Kamran Ghoreschi
- Department of Dermatology, Venereology and Allergology, Charité–Universitätsmedizin Berlin, Berlin, Germany
- Tabea Wilhelm
- Department of Dermatology, Venereology and Allergology, Charité–Universitätsmedizin Berlin, Berlin, Germany
- Heinz Kutzner
- 0Dermatopathology Laboratory, Friedrichshafen, Germany
- Carola Berking
- 1Department of Dermatology, University Hospital Erlangen, Erlangen, Germany
- Markus V. Heppt
- 1Department of Dermatology, University Hospital Erlangen, Erlangen, Germany
- Sebastian Haferkamp
- 2Department of Dermatology, University Hospital Regensburg, Regensburg, Germany
- Wiebke Sondermann
- 3Department of Dermatology, University Hospital Essen, Essen, Germany
- Dirk Schadendorf
- 3Department of Dermatology, University Hospital Essen, Essen, Germany
- Bastian Schilling
- 4Department of Dermatology, University Hospital Würzburg, Würzburg, Germany
- Benjamin Izar
- 5Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, United States
- Roman Maron
- National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany
- Max Schmitt
- National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany
- Stefan Fröhling
- National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany
- Stefan Fröhling
- 6Translational Cancer Epigenomics, Division of Translational Medical Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Daniel B. Lipka
- National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany
- Daniel B. Lipka
- 6Translational Cancer Epigenomics, Division of Translational Medical Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Daniel B. Lipka
- 7Faculty of Medicine, Medical Center, Otto-von-Guericke-University, Magdeburg, Germany
- Titus J. Brinker
- National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany
- DOI
- https://doi.org/10.3389/fmed.2020.00177
- Journal volume & issue
-
Vol. 7
Abstract
Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects on classifier performance are currently unknown. Here, we systematically examine the effects of label noise by training and evaluating convolutional neural networks (CNN) with 804 images of melanoma and nevi labeled either by dermatologists or by biopsy. The CNNs are evaluated on a test set of 384 images by means of 4-fold cross validation comparing the outputs with either the corresponding dermatological or the biopsy-verified diagnosis. With identical ground truths of training and test labels, high accuracies with 75.03% (95% CI: 74.39–75.66%) for dermatological and 73.80% (95% CI: 73.10–74.51%) for biopsy-verified labels can be achieved. However, if the CNN is trained and tested with different ground truths, accuracy drops significantly to 64.53% (95% CI: 63.12–65.94%, p < 0.01) on a non-biopsy-verified and to 64.24% (95% CI: 62.66–65.83%, p < 0.01) on a biopsy-verified test set. In conclusion, deep learning methods for skin cancer classification are highly sensitive to label noise and future work should use biopsy-verified training images to mitigate this problem.
Keywords