IEEE Access (Jan 2024)
Weighted Sparse Convolution and Transformer Feature Aggregation Networks for 3D Dental Segmentation
Abstract
The conventional alginate technique, widely employed in dentistry to capture tooth morphology, has faced challenges, particularly due to potential discomfort and the risk of allergy reactions among specific patient groups. Consequently, 3D intraoral scanners (IOS), enabling contactless acquisition of dental shapes, have gained widespread adoption. However, for tooth segmentation in 3D dental images obtained through IOS, the majority of methods heavily rely on labor-intensive annotation of high-quality datasets. In this study, we introduce the Weighted Sparse Convolution and Transformer Feature Aggregation Network (WCTN) as a model designed for the segmentation of teeth within 3D dental datasets. The voxel-grid partitioning mechanism in this model efficiently clusters point clouds to extract features with minimal resource usage. Employing weighted sparse convolution operations, WCTN extracts local features from grouped points, followed by sequential capturing of global features through Transformer modules. An adaptive feature fusion strategy was devised, seamlessly combining local and global features to yield robust representations, particularly optimized for 3D dental datasets with uniform density. We evaluated three different versions of WCTN, distinguished by the number of features. Among them, the largest-scale model exhibited superior performance, compared to graph-based models and Transformer-based models with the overall accuracy improvements of 1.89% and 7.18%, respectively. This result highlights the outstanding performance achieved by aggregating diverse features. In conclusion, the proposed model possesses the potential to expedite and automate tooth segmentation tasks, promising to enhance current clinical practices.
Keywords