Deep learning for Evaluation and Prediction of TecHnical Skills in robotic-assisted vaginal cuff closure study
No Thumbnail Available
Authors
Tesfai, Freweini
Xu, Jialang
Anastasiou, Dimitrios
He, Runlong
Boal, Matthew
Aranan, Yekaterina
Lingam, Gita
Shah, Diya
Stoyanov, Danail
Chandrasekaran, Dhivya
Contact
Check for full-text access
Issue Date
2026
Type
Article
Language
Keywords
Alternative Title
Abstract
BACKGROUND: To support surgical education, an increasing focus has been on integrating surgical data, including surgical motion and activity and process understanding, to develop predictive models to assess surgical skills. OBJECTIVE: This study aimed to develop deep learning models based on fine-grained analysis to predict technical errors and generic surgical skills during robotic-assisted vaginal cuff closures as part of a hysterectomy. STUDY DESIGN: This was a multicenter prospective observational cohort study of robotic-assisted total hysterectomy performed between 2023 and 2025. Vaginal cuff closure video segments, recorded on the Touch Surgery video platform via the DS1 computer, were extracted and double-annotated by 2 trained surgeons: errors via Objective Clinical Human Reliability Analysis and global skill via Modifiable Global Evaluative Assessment of Robotic Skills. Of note, 3 deep learning pipelines were developed: 2 crucial surgical tasks (surgical video error detection via temporal modeling models and surgical skill assessment via few-shot surgical skill assessment) and multimodal learning. RESULTS: A total of 40 videos, including 667 minutes (1,201,654 frames), from 2 centers were analyzed. Of note, 11 surgeons performed vaginal cuff closure (3 beginners, 5 intermediates, and 3 experts). Interrater reliability was good for both Modifiable Global Evaluative Assessment of Robotic Skills (intraclass correlation coefficient, 0.807; P=.001) and Objective Clinical Human Reliability Analysis error counts (intraclass correlation coefficient, 0.712; P=.010). The median Modifiable Global Evaluative Assessment of Robotic Skills score was 21.0 (interquartile range, 19.1-24.6), and the median error was 25.0 (interquartile range, 16.3-31.5). Level of experience showed a significant correlation to Modifiable Global Evaluative Assessment of Robotic Skills: the Kruskal-Wallis test was <0.002. Significant correlations were found between operative time and Modifiable Global Evaluative Assessment of Robotic Skills and Objective Clinical Human Reliability Analysis (r(s)=-0.534 [P<.001] and r(s)=0.421 [P=.007], respectively). Few-shot experiments showed that the model achieves a performance of 81.70% accuracy and 81.30% F1 score in the 5-shot setting. The multimodal skill assessment model achieved excellent agreement with manual assessment ratings (r(s)=0.85±0.02; mean absolute error=1.85±0.16). CONCLUSION: This proof of concept shows that deep learning can objectively score generic surgical skill and initial flag frame-level errors in vaginal cuff closure videos, aligning with validated objective assessment tools. Although larger, multicenter datasets remain essential, these results lay the groundwork for artificial intelligence-driven quality monitoring and evidence-based credentialing in minimally invasive gynecologic surgery.
Description
Citation
Publisher
License
Journal
American Journal of Obstetrics and Gynecology
