Quantifying acceptable artefact ranges for dermatologic classification algorithms

T.C. Petrie; C. Larson; M. Heath; R. Samatham; A. Davis; E.G. Berry; S.A. Leachman

doi:10.1002/ski2.19

Skin Health and Disease (Jun 2021)

Quantifying acceptable artefact ranges for dermatologic classification algorithms

T.C. Petrie,
C. Larson,
M. Heath,
R. Samatham,
A. Davis,
E.G. Berry,
S.A. Leachman

Affiliations

T.C. Petrie: Department of Dermatology Oregon Health & Science University Portland Oregon USA
C. Larson: Department of Dermatology Oregon Health & Science University Portland Oregon USA
M. Heath: Department of Dermatology Oregon Health & Science University Portland Oregon USA
R. Samatham: Department of Dermatology Oregon Health & Science University Portland Oregon USA
A. Davis: Department of Dermatology Oregon Health & Science University Portland Oregon USA
E.G. Berry: Department of Dermatology Oregon Health & Science University Portland Oregon USA
S.A. Leachman: Department of Dermatology Oregon Health & Science University Portland Oregon USA

DOI: https://doi.org/10.1002/ski2.19
Journal volume & issue: Vol. 1, no. 2
pp. n/a – n/a

Abstract

Read online

Abstract Background Many classifiers have been developed that can distinguish different types of skin lesions (e.g., benign nevi, melanoma) with varying degrees of success.1–5 However, even successfully trained classifiers may perform poorly on images that include artefacts. While problems created by hair and ink markings have been published, quantitative measurements of blur, colour and lighting variations on classification accuracy has not yet been reported to our knowledge. Objectives We created a system that measures the impact of various artefacts on machine learning accuracy. Our objectives were to (1) quantitatively identify the most egregious artefacts and (2) demonstrate how to assess a classification algorithm's accuracy when input images include artefacts. Methods We injected artefacts into dermatologic images using techniques that could be controlled with a single variable. This allows us to quantitatively evaluate the impact on the accuracy. We trained two convolutional neural networks on two different binary classification tasks and measured the impact on dermoscopy images over a range of parameter values. The area under the curve and specificity‐at‐a‐given‐sensitivity values were measured for each artefact induced at each parameter. Results General blur had the strongest negative effect on the melanoma versus other task. Conversely, shifting the hue towards blue had a more pronounced effect on the suspicious versus follow task. Conclusions Classifiers should either mitigate artefacts or detect them. Images should be excluded from diagnosis/recommendation when artefacts are present in amounts outside the machine perceived quality range. Failure to do so will reduce accuracy and impede approval from regulatory agencies.

Published in Skin Health and Disease

ISSN: 2690-442X (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Dermatology
Website: https://academic.oup.com/skinhd?login=false

About the journal