Scientific Reports (Sep 2024)
Leveraging AI and patient metadata to develop a novel risk score for skin cancer detection
Abstract
Abstract Melanoma of the skin is the 17th most common cancer worldwide. Early detection of suspicious skin lesions (melanoma) can increase 5-year survival rates by 20%. The 7-point checklist (7PCL) has been extensively used to suggest urgent referrals for patients with a possible melanoma. However, the 7PCL method only considers seven meta-features to calculate a risk score and is only relevant for patients with suspected melanoma. There are limited studies on the extensive use of patient metadata for the detection of all skin cancer subtypes. This study investigates artificial intelligence (AI) models that utilise patient metadata consisting of 23 attributes for suspicious skin lesion detection. We have identified a new set of most important risk factors, namely “C4C risk factors”, which is not just for melanoma, but for all types of skin cancer. The performance of the C4C risk factors for suspicious skin lesion detection is compared to that of the 7PCL and the Williams risk factors that predict the lifetime risk of melanoma. Our proposed AI framework ensembles five machine learning models and identifies seven new skin cancer risk factors: lesion pink, lesion size, lesion colour, lesion inflamed, lesion shape, lesion age, and natural hair colour, which achieved a sensitivity of $$80.46\pm 2.50\%$$ 80.46 ± 2.50 % and a specificity of $$62.09\pm 1.90\%$$ 62.09 ± 1.90 % in detecting suspicious skin lesions when evaluated using the metadata of 53,601 skin lesions collected from different skin cancer diagnostic clinics across the UK, significantly outperforming the 7PCL-based method (sensitivity $$68.09\pm 2.10\%$$ 68.09 ± 2.10 % , specificity $$61.07\pm 0.90\%$$ 61.07 ± 0.90 % ) and the Williams risk factors (sensitivity $$66.32\pm 1.90\%$$ 66.32 ± 1.90 % , specificity $$61.71\pm 0.6\%$$ 61.71 ± 0.6 % ). Furthermore, through weighting the seven new risk factors we came up with a new risk score, namely “C4C risk score”, which alone achieved a sensitivity of $$76.09\pm 1.20\%$$ 76.09 ± 1.20 % and a specificity of $$61.71\pm 0.50\%$$ 61.71 ± 0.50 % , significantly outperforming the 7PCL-based risk score (sensitivity $$73.91\pm 1.10\%$$ 73.91 ± 1.10 % , specificity $$49.49\pm 0.50\%$$ 49.49 ± 0.50 % ) and the Williams risk score (sensitivity $$60.68\pm 1.30\%$$ 60.68 ± 1.30 % , specificity $$60.87\pm 0.80\%$$ 60.87 ± 0.80 % ). Finally, fusing the C4C risk factors with the 7PCL and Williams risk factors achieved the best performance, with a sensitivity of $$85.24\pm 2.20\%$$ 85.24 ± 2.20 % and a specificity of $$61.12\pm 0.90\%$$ 61.12 ± 0.90 % . We believe that fusing these newly found risk factors and new risk score with image data will further boost the AI model performance for suspicious skin lesion detection. Hence, the new set of skin cancer risk factors has the potential to be used to modify current skin cancer referral guidelines for all skin cancer subtypes, including melanoma.