Journal of Medical Internet Research (Jan 2020)

Analysis of Collective Human Intelligence for Diagnosis of Pigmented Skin Lesions Harnessed by Gamification Via a Web-Based Training Platform: Simulation Reader Study

  • Rinner, Christoph,
  • Kittler, Harald,
  • Rosendahl, Cliff,
  • Tschandl, Philipp

DOI
https://doi.org/10.2196/15597
Journal volume & issue
Vol. 22, no. 1
p. e15597

Abstract

Read online

BackgroundThe diagnosis of pigmented skin lesion is error prone and requires domain-specific expertise, which is not readily available in many parts of the world. Collective intelligence could potentially decrease the error rates of nonexperts. ObjectiveThe aim of this study was to evaluate the feasibility and impact of collective intelligence for the detection of skin cancer. MethodsWe created a gamified study platform on a stack of established Web technologies and presented 4216 dermatoscopic images of the most common benign and malignant pigmented skin lesions to 1245 human raters with different levels of experience. Raters were recruited via scientific meetings, mailing lists, and social media posts. Education was self-declared, and domain-specific experience was tested by screening tests. In the target test, the readers had to assign 30 dermatoscopic images to 1 of the 7 disease categories. The readers could repeat the test with different lesions at their own discretion. Collective human intelligence was achieved by sampling answers from multiple readers. The disease category with most votes was regarded as the collective vote per image. ResultsWe collected 111,019 single ratings, with a mean of 25.2 (SD 18.5) ratings per image. As single raters, nonexperts achieved a lower mean accuracy (58.6%) than experts (68.4%; mean difference=−9.4%; 95% CI −10.74% to −8.1%; P<.001). Collectives of nonexperts achieved higher accuracies than single raters, and the improvement increased with the size of the collective. A collective of 4 nonexperts surpassed single nonexperts in accuracy by 6.3% (95% CI 6.1% to 6.6%; P<.001). The accuracy of a collective of 8 nonexperts was 9.7% higher (95% CI 9.5% to 10.29%; P<.001) than that of single nonexperts, an improvement similar to single experts (P=.73). The sensitivity for malignant images increased for nonexperts (66.3% to 77.6%) and experts (64.6% to 79.4%) for answers given faster than the intrarater mean. ConclusionsA high number of raters can be attracted by elements of gamification and Web-based marketing via mailing lists and social media. Nonexperts increase their accuracy to expert level when acting as a collective, and faster answers correspond to higher accuracy. This information could be useful in a teledermatology setting.