Skin Health and Disease (Jun 2021)

Quantifying acceptable artefact ranges for dermatologic classification algorithms

  • T.C. Petrie,
  • C. Larson,
  • M. Heath,
  • R. Samatham,
  • A. Davis,
  • E.G. Berry,
  • S.A. Leachman

DOI
https://doi.org/10.1002/ski2.19
Journal volume & issue
Vol. 1, no. 2
pp. n/a – n/a

Abstract

Read online

Abstract Background Many classifiers have been developed that can distinguish different types of skin lesions (e.g., benign nevi, melanoma) with varying degrees of success.1–5 However, even successfully trained classifiers may perform poorly on images that include artefacts. While problems created by hair and ink markings have been published, quantitative measurements of blur, colour and lighting variations on classification accuracy has not yet been reported to our knowledge. Objectives We created a system that measures the impact of various artefacts on machine learning accuracy. Our objectives were to (1) quantitatively identify the most egregious artefacts and (2) demonstrate how to assess a classification algorithm's accuracy when input images include artefacts. Methods We injected artefacts into dermatologic images using techniques that could be controlled with a single variable. This allows us to quantitatively evaluate the impact on the accuracy. We trained two convolutional neural networks on two different binary classification tasks and measured the impact on dermoscopy images over a range of parameter values. The area under the curve and specificity‐at‐a‐given‐sensitivity values were measured for each artefact induced at each parameter. Results General blur had the strongest negative effect on the melanoma versus other task. Conversely, shifting the hue towards blue had a more pronounced effect on the suspicious versus follow task. Conclusions Classifiers should either mitigate artefacts or detect them. Images should be excluded from diagnosis/recommendation when artefacts are present in amounts outside the machine perceived quality range. Failure to do so will reduce accuracy and impede approval from regulatory agencies.