Scientific Reports (Aug 2023)

Predicting BRAFV600E mutations in papillary thyroid carcinoma using six machine learning algorithms based on ultrasound elastography

  • Enock Adjei Agyekum,
  • Yu-guo Wang,
  • Fei-Ju Xu,
  • Debora Akortia,
  • Yong-zhen Ren,
  • Kevoyne Hakeem Chambers,
  • Xian Wang,
  • Jenny Olalia Taupa,
  • Xiao-qin Qian

DOI
https://doi.org/10.1038/s41598-023-39747-6
Journal volume & issue
Vol. 13, no. 1
pp. 1 – 14

Abstract

Read online

Abstract The most common BRAF mutation is thymine (T) to adenine (A) missense mutation in nucleotide 1796 (T1796A, V600E). The BRAFV600E gene encodes a protein-dependent kinase (PDK), which is a key component of the mitogen-activated protein kinase pathway and essential for controlling cell proliferation, differentiation, and death. The BRAFV600E mutation causes PDK to be activated improperly and continuously, resulting in abnormal proliferation and differentiation in PTC. Based on elastography ultrasound (US) radiomic features, this study seeks to create and validate six distinct machine learning algorithms to predict BRAFV6OOE mutation in PTC patients prior to surgery. This study employed routine US strain elastography image data from 138 PTC patients. The patients were separated into two groups: those who did not have the BRAFV600E mutation (n = 75) and those who did have the mutation (n = 63). The patients were randomly assigned to one of two data sets: training (70%), or validation (30%). From strain elastography US images, a total of 479 radiomic features were retrieved. Pearson's Correlation Coefficient (PCC) and Recursive Feature Elimination (RFE) with stratified tenfold cross-validation were used to decrease the features. Based on selected radiomic features, six machine learning algorithms including support vector machine with the linear kernel (SVM_L), support vector machine with radial basis function kernel (SVM_RBF), logistic regression (LR), Naïve Bayes (NB), K-nearest neighbors (KNN), and linear discriminant analysis (LDA) were compared to predict the possibility of BRAFV600E. The accuracy (ACC), the area under the curve (AUC), sensitivity (SEN), specificity (SPEC), positive predictive value (PPV), negative predictive value (NPV), decision curve analysis (DCA), and calibration curves of the machine learning algorithms were used to evaluate their performance. ① The machine learning algorithms' diagnostic performance depended on 27 radiomic features. ② AUCs for NB, KNN, LDA, LR, SVM_L, and SVM_RBF were 0.80 (95% confidence interval [CI]: 0.65–0.91), 0.87 (95% CI 0.73–0.95), 0.91(95% CI 0.79–0.98), 0.92 (95% CI 0.80–0.98), 0.93 (95% CI 0.80–0.98), and 0.98 (95% CI 0.88–1.00), respectively. ③ There was a significant difference in echogenicity,vertical and horizontal diameter ratios, and elasticity between PTC patients with BRAFV600E and PTC patients without BRAFV600E. Machine learning algorithms based on US elastography radiomic features are capable of predicting the likelihood of BRAFV600E in PTC patients, which can assist physicians in identifying the risk of BRAFV600E in PTC patients. Among the six machine learning algorithms, the support vector machine with radial basis function (SVM_RBF) achieved the best ACC (0.93), AUC (0.98), SEN (0.95), SPEC (0.90), PPV (0.91), and NPV (0.95).