npj Digital Medicine (Feb 2024)

Optimizing skin disease diagnosis: harnessing online community data with contrastive learning and clustering techniques

  • Yue Shen,
  • Huanyu Li,
  • Can Sun,
  • Hongtao Ji,
  • Daojun Zhang,
  • Kun Hu,
  • Yiqi Tang,
  • Yu Chen,
  • Zikun Wei,
  • Junwei Lv

DOI
https://doi.org/10.1038/s41746-024-01014-x
Journal volume & issue
Vol. 7, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Skin diseases pose significant challenges in China. Internet health forums offer a platform for millions of users to discuss skin diseases and share images for early intervention, leaving large amount of valuable dermatology images. However, data quality and annotation challenges limit the potential of these resources for developing diagnostic models. In this study, we proposed a deep-learning model that utilized unannotated dermatology images from diverse online sources. We adopted a contrastive learning approach to learn general representations from unlabeled images and fine-tuned the model on coarsely annotated images from Internet forums. Our model classified 22 common skin diseases. To improve annotation quality, we used a clustering method with a small set of standardized validation images. We tested the model on images collected by 33 experienced dermatologists from 15 tertiary hospitals and achieved a 45.05% top-1 accuracy, outperforming the published baseline model by 3%. Accuracy increased with additional validation images, reaching 49.64% with 50 images per category. Our model also demonstrated transferability to new tasks, such as detecting monkeypox, with a 61.76% top-1 accuracy using only 50 additional images in the training process. We also tested our model on benchmark datasets to show the generalization ability. Our findings highlight the potential of unannotated images from online forums for future dermatology applications and demonstrate the effectiveness of our model for early diagnosis and potential outbreak mitigation.