Optimal Training Positive Sample Size Determination for Deep Learning with a Validation on CBCT Image Caries Recognition
Yanlin Wang,
Gang Li,
Xinyue Zhang,
Yue Wang,
Zhenhao Zhang,
Jupeng Li,
Junqi Ma,
Linghang Wang
Affiliations
Yanlin Wang
National Center for Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Device & Beijing Key Laboratory of Digital Stomatology & NHC Key Laboratory of Digital Stomatology, Department of Oral and Maxillofacial Radiology, Peking University School and Hospital of Stomatology, Beijing 100080, China
Gang Li
National Center for Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Device & Beijing Key Laboratory of Digital Stomatology & NHC Key Laboratory of Digital Stomatology, Department of Oral and Maxillofacial Radiology, Peking University School and Hospital of Stomatology, Beijing 100080, China
Xinyue Zhang
National Center for Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Device & Beijing Key Laboratory of Digital Stomatology & NHC Key Laboratory of Digital Stomatology, Department of Oral and Maxillofacial Radiology, Peking University School and Hospital of Stomatology, Beijing 100080, China
Yue Wang
School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China
Zhenhao Zhang
School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China
Jupeng Li
School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China
Junqi Ma
YOFO Medical Technology Co., Ltd., Hefei 230093, China
Linghang Wang
YOFO Medical Technology Co., Ltd., Hefei 230093, China
Objectives: During deep learning model training, it is essential to consider the balance among the effects of sample size, actual resources, and time constraints. Single-arm objective performance criteria (OPC) was proposed to determine the optimal positive sample size for training deep learning models in caries recognition. Methods: An expected sensitivity (PT) of 0.6 and a clinically acceptable sensitivity (P0) of 0.5 were applied to the single-arm OPC calculation formula, yielding an optimal training set comprising 263 carious teeth. U-Net, YOLOv5n, and CariesDetectNet were trained and validated using clinically self-collected cone-beam computed tomography (CBCT) images that included varying quantities of carious teeth. To assess performance, an additional dataset was utilized to evaluate the accuracy of caries detection by both the models and two dental radiologists. Results: When the number of carious teeth reached approximately 250, the models reached the optimal performance levels. U-Net demonstrated superior performance, achieving accuracy, sensitivity, specificity, F1-Score, and Dice similarity coefficients of 0.9929, 0.9307, 0.9989, 0.9590, and 0.9435, respectively. The three models exhibited greater accuracy in caries recognition compared to dental radiologists. Conclusions: This study demonstrated that the positive sample size of CBCT images containing caries was predictable and could be calculated using single-arm OPC.