IEEE Access (Jan 2024)
Efficient Tumor Detection and Classification Model Based on ViT in an End-to-End Architecture
Abstract
Accurate tumor detection and classification are crucial for cancer diagnosis and treatment. Traditional medical image analysis methods face many challenges when dealing with highly heterogeneous tumor images, such as large differences in image quality and unclear or complex tumor features. Although breakthroughs have been made in image processing with deep learning techniques, there are still limitations in identifying small or irregular tumors. Existing tumor detection models often rely on local feature extraction, neglecting global information and subtle differences in the images, which limits their accuracy and robustness in practical applications. To address these issues, this paper proposes a deep learning model that integrates Feature Pyramid Network (FPN) and Vision Transformer (ViT) within an end-to-end architecture. Firstly, the model extracts rich features at multiple scales through FPN, covering various aspects from cellular structures to tissue layouts. Then, by introducing ViT, the model can effectively process and analyze global features, particularly achieving higher accuracy in recognizing ambiguous or complex tumor patterns. The self-attention mechanism further enhances the model’s focus on critical regions of the image, improving its ability to detect subtle differences. Finally, the design of the end-to-end architecture enhances the overall efficiency and consistency of the model, facilitating global optimization and further improving detection and classification performance. The experimental results show that compared to existing techniques, this model demonstrates higher recognition accuracy on medical image datasets such as TCIA, BraTS, LUNA, and Camelyon17. The accuracy and F1 scores improved by 4.65% to 6.24%. These algorithmic improvements not only enhance the efficiency and accuracy of tumor detection but also provide new pathways for the application of deep learning in medical image analysis.
Keywords