Annals of Surgery Open (Sep 2023)

Discrimination, Reliability, Sensitivity, and Specificity of Robotic Surgical Proficiency Assessment With Global Evaluative Assessment of Robotic Skills and Binary Scoring Metrics: Results From a Randomized Controlled Trial

  • Ruben De Groote, MD,
  • Stefano Puliatti, MD,
  • Marco Amato, MD,
  • Elio Mazzone, MD,
  • Alessandro Larcher, MD,
  • Rui Farinha, MD,
  • Artur Paludo, MD,
  • Liesbeth Desender, MD, PhD,
  • Nicolas Hubert, MD,
  • Ben Van Cleynenbreugel, MD, PhD,
  • Brendan P. Bunting, PhD,
  • Alexandre Mottrie, MD, PhD,
  • Anthony G. Gallagher, PhD, DSc, MAE,
  • On behalf of the Junior ERUS/ YAU working group on robot-assisted surgery of the European Association of Urology and the ERUS Education Working Group. Collaborators:,
  • Giuseppe Rosiello, MD,
  • Pieter Uvin, MD, PhD,
  • Jasper Decoene, MD,
  • Tom Tuyten, MD,
  • Mathieu D’Hondt, MD,
  • Charles Chatzopoulos, MD,
  • Bart De Troyer, MD,
  • Filippo Turri, MD,
  • Paolo Dell’Oglio, MD,
  • Nikolaos Liakos, MD,
  • Carlo Andrea Bravi, MD,
  • Edward Lambert, MD,
  • Iulia Andras, MD,
  • Fabrizio Di Maida, MD,
  • Wouter Everaerts, MD, PhD

DOI
https://doi.org/10.1097/AS9.0000000000000307
Journal volume & issue
Vol. 4, no. 3
p. e307

Abstract

Read online

Objective:. To compare binary metrics and Global Evaluative Assessment of Robotic Skills (GEARS) evaluations of training outcome assessments for reliability, sensitivity, and specificity. Background:. GEARS–Likert-scale skills assessment are a widely accepted tool for robotic surgical training outcome evaluations. Proficiency-based progression (PBP) training is another methodology but uses binary performance metrics for evaluations. Methods:. In a prospective, randomized, and blinded study, we compared conventional with PBP training for a robotic suturing, knot-tying anastomosis task. Thirty-six surgical residents from 16 Belgium residency programs were randomized. In the skills laboratory, the PBP group trained until they demonstrated a quantitatively defined proficiency benchmark. The conventional group were yoked to the same training time but without the proficiency requirement. The final trial was video recorded and assessed with binary metrics and GEARS by robotic surgeons blinded to individual, group, and residency program. Sensitivity and specificity of the two assessment methods were evaluated with area under the curve (AUC) and receiver operating characteristics (ROC) curves. Results:. The PBP group made 42% fewer objectively assessed performance errors than the conventional group (P < 0.001) and scored 15% better on the GEARS assessment (P = 0.033). The mean interrater reliability for binary metrics and GEARS was 0.87 and 0.38, respectively. Binary total error metrics AUC was 97% and for GEARS 85%. With a sensitivity threshold of 0.8, false positives rates were 3% and 25% for, respectively, the binary and GEARS assessments. Conclusions:. Binary metrics for scoring a robotic VUA task demonstrated better psychometric properties than the GEARS assessment.