DeepMetaForge: A Deep Vision-Transformer Metadata-Fusion Network for Automatic Skin Lesion Classification

Sirawich Vachmanus; Thanapon Noraset; Waritsara Piyanonpong; Teerapong Rattananukrom; Suppawong Tuarob

doi:10.1109/ACCESS.2023.3345225

IEEE Access (Jan 2023)

DeepMetaForge: A Deep Vision-Transformer Metadata-Fusion Network for Automatic Skin Lesion Classification

Sirawich Vachmanus,
Thanapon Noraset,
Waritsara Piyanonpong,
Teerapong Rattananukrom,
Suppawong Tuarob

Affiliations

Sirawich Vachmanus: ORCiD; Faculty of Information and Communication Technology, Mahidol University, Nakhon Pathom, Thailand
Thanapon Noraset: Faculty of Information and Communication Technology, Mahidol University, Nakhon Pathom, Thailand
Waritsara Piyanonpong: Division of Dermatology, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
Teerapong Rattananukrom: Division of Dermatology, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
Suppawong Tuarob: ORCiD; Faculty of Information and Communication Technology, Mahidol University, Nakhon Pathom, Thailand

DOI: https://doi.org/10.1109/ACCESS.2023.3345225
Journal volume & issue: Vol. 11
pp. 145467 – 145484

Abstract

Read online

Skin cancer is a dangerous form of cancer that develops slowly in skin cells. Delays in diagnosing and treating these malignant skin conditions may have serious repercussions. Likewise, early skin cancer detection has been shown to improve treatment outcomes. This paper proposes DeepMetaForge, a deep-learning framework for skin cancer detection from metadata-accompanied images. The proposed framework utilizes BEiT, a vision transformer pre-trained as a masked image modeling task, as the image-encoding backbone. We further propose merging the encoded metadata with the derived visual characteristics while compressing the aggregate information simultaneously, simulating how photos with metadata are interpreted. The experiment results on four public datasets of dermoscopic and smartphone skin lesion images reveal that the best configuration of our proposed framework yields 87.1% macro-average F1 on average. The empirical scalability analysis further shows that the proposed framework can be implemented in a variety of machine-learning paradigms, including applications on low-resource devices and as services. The findings shed light on not only the possibility of implementing telemedicine solutions for skin cancer on a nationwide scale that could benefit those in need of quality healthcare but also open doors to many intelligent applications in medicine where images and metadata are collected together, such as disease detection from CT-scan images and patients’ expression recognition from facial images.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords