IEEE Access (Jan 2024)
Multimodal Deep Convolutional Neural Network Pipeline for AI-Assisted Early Detection of Oral Cancer
Abstract
Oral Squamous Cell Carcinoma (OSCC) poses a significant health challenge, with early detection being crucial for effective treatment and improved survival rates. While previous studies have examined the use of standard photographs, such as those from smartphones, for oral lesion classification, they typically rely solely on images, overlooking the potential benefits of incorporating multiple modalities. This study addresses this gap by proposing a multimodal deep-learning pipeline incorporating diverse data sources, including patient metadata, which mimics the diagnostic approach of clinicians in the early detection of oral cancer. The study leverages state-of-the-art image encoders to classify oral lesions into benign and potentially malignant categories. A performance comparison of six pre-trained deep-learning models (MobileNetV3-Large, MixNet-S, ResNet-50, HRNet-W18-C, DenseNet-121, and Inception_v3) is presented. The performance of the proposed pipeline achieved an overall accuracy of 81%, precision of 79%, recall of 79%, F1-score of 78%, and a Matthews Correlation Coefficient (MCC) of 0.57 using the MobileNetV3-Large image encoder. The findings highlight the efficacy of integrating multiple data modalities for more accurate early detection of potential malignancies compared to using only image data. The outcomes could pave the way for improved clinical decision-making and patient outcomes.
Keywords