Deep learning for Evaluation and Prediction of TecHnical Skills in robotic-assisted vaginal cuff closure study

BACKGROUND: To support surgical education, an increasing focus has been on integrating surgical data, including surgical motion and activity and process understanding, to develop predictive models to assess surgical skills. OBJECTIVE: This study aimed to develop deep learning models based on fine-grained analysis to predict technical errors and generic surgical skills during robotic-assisted vaginal cuff closures as part of a hysterectomy. STUDY DESIGN: This was a multicenter prospective observational cohort study of robotic-assisted total hysterectomy performed between 2023 and 2025. Vaginal cuff closure video segments, recorded on the Touch Surgery video platform via the DS1 computer, were extracted and double-annotated by 2 trained surgeons: errors via Objective Clinical Human Reliability Analysis and global skill via Modifiable Global Evaluative Assessment of Robotic Skills. Of note, 3 deep learning pipelines were developed: 2 crucial surgical tasks (surgical video error detection via temporal modeling models and surgical skill assessment via few-shot surgical skill assessment) and multimodal learning. RESULTS: A total of 40 videos, including 667 minutes (1,201,654 frames), from 2 centers were analyzed. Of note, 11 surgeons performed vaginal cuff closure (3 beginners, 5 intermediates, and 3 experts). Interrater reliability was good for both Modifiable Global Evaluative Assessment of Robotic Skills (intraclass correlation coefficient, 0.807; P=.001) and Objective Clinical Human Reliability Analysis error counts (intraclass correlation coefficient, 0.712; P=.010). The median Modifiable Global Evaluative Assessment of Robotic Skills score was 21.0 (interquartile range, 19.1-24.6), and the median error was 25.0 (interquartile range, 16.3-31.5). Level of experience showed a significant correlation to Modifiable Global Evaluative Assessment of Robotic Skills: the Kruskal-Wallis test was <0.002. Significant correlations were found between operative time and Modifiable Global Evaluative Assessment of Robotic Skills and Objective Clinical Human Reliability Analysis (r(s)=-0.534 [P<.001] and r(s)=0.421 [P=.007], respectively). Few-shot experiments showed that the model achieves a performance of 81.70% accuracy and 81.30% F1 score in the 5-shot setting. The multimodal skill assessment model achieved excellent agreement with manual assessment ratings (r(s)=0.85±0.02; mean absolute error=1.85±0.16). CONCLUSION: This proof of concept shows that deep learning can objectively score generic surgical skill and initial flag frame-level errors in vaginal cuff closure videos, aligning with validated objective assessment tools. Although larger, multicenter datasets remain essential, these results lay the groundwork for artificial intelligence-driven quality monitoring and evidence-based credentialing in minimally invasive gynecologic surgery.

Journal

American Journal of Obstetrics and Gynecology

URI

https://hdl.handle.net/20.500.14753/2840

PubMed ID

41864316

Collections

Obstetrics and Gynaecology

Full item page

Deep learning for Evaluation and Prediction of TecHnical Skills in robotic-assisted vaginal cuff closure study

Authors

Contact

Check for full-text access

Issue Date

Type

Language

Keywords

Alternative Title

Abstract

Description

Citation

Publisher

License

Journal

Volume

Issue

URI

PubMed ID

DOI

ISSN

EISSN

Collections