Progress in Orthodontics (Sep 2024)

Classification of cervical vertebral maturation stages with machine learning models: leveraging datasets with high inter- and intra-observer agreement

  • Potjanee Kanchanapiboon,
  • Pitipat Tunksook,
  • Prinya Tunksook,
  • Panrasee Ritthipravat,
  • Supatchai Boonpratham,
  • Yodhathai Satravaha,
  • Chaiyapol Chaweewannakorn,
  • Supakit Peanchitlertkajorn

DOI
https://doi.org/10.1186/s40510-024-00535-1
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Objectives This study aimed to assess the accuracy of machine learning (ML) models with feature selection technique in classifying cervical vertebral maturation stages (CVMS). Consensus-based datasets were used for models training and evaluation for their model generalization capabilities on unseen datasets. Methods Three clinicians independently rated CVMS on 1380 lateral cephalograms, resulting in the creation of five datasets: two consensus-based datasets (Complete Agreement and Majority Voting), and three datasets based on a single rater’s evaluations. Additionally, landmarks annotation of the second to fourth cervical vertebrae and patients’ information underwent a feature selection process. These datasets were used to train various ML models and identify the top-performing model for each dataset. These models were subsequently tested on their generalization capabilities. Results Features that considered significant in the consensus-based datasets were consistent with a CVMS guideline. The Support Vector Machine model on the Complete Agreement dataset achieved the highest accuracy (77.4%), followed by the Multi-Layer Perceptron model on the Majority Voting dataset (69.6%). Models from individual ratings showed lower accuracies (60.4–67.9%). The consensus-based training models also exhibited lower coefficient of variation (CV), indicating superior generalization capability compared to models from single raters. Conclusion ML models trained on consensus-based datasets for CVMS classification exhibited the highest accuracy, with significant features consistent with the original CVMS guidelines. These models also showed robust generalization capabilities, underscoring the importance of dataset quality.

Keywords