IEEE Access (Jan 2024)

Real-Time Surgical Instrument Segmentation Analysis Using YOLOv8 With ByteTrack for Laparoscopic Surgery

  • Nyi Nyi Myo,
  • Apiwat Boonkong,
  • Kovit Khampitak,
  • Daranee Hormdee

DOI
https://doi.org/10.1109/ACCESS.2024.3412780
Journal volume & issue
Vol. 12
pp. 83091 – 83103

Abstract

Read online

As Computer Vision technology has evolved rapidly these days, the implementation of object detection and instance segmentation has been presented in various areas. In computer-aided laparoscopic surgery, the segmentation of surgical instruments is one of the active research areas. This paper presents the implementation and comparative analysis of a real-time surgical instruments segmentation system by incorporating ByteTrack, a powerful object tracking system, within the YOLOv8, a state-of-the-art Deep Learning algorithm for object detection and segmentation, together with an instruments gesture analysis of the practical results. The instrument gestures have been categorized into separating, crossing, and overlapping cases according to the most common instrument gestures during the surgery. The datasets from the ROBUST-MIS 2019 challenge have been applied and annotated for training, validating, and blind testing in this study. Considering trade-offs among model complexity, speed, and accuracy, the medium version (YOLOv8m) has been chosen in this study for its comparative model complexity, design for working in real-time, and relative high accuracy. In order to validate the effectiveness of this research, real-time segmentation of surgical instruments has been performed with the streaming of laparoscopic gynecologic surgery on 5 donated soft-tissue cadaver cases. According to the experimental results, although YOLOv8 can provide very high-accuracy evaluation metrics for both F1-score and mAP (mean Average Precision), the segmentation accuracy results could have been further improved by incorporating the ByteTrack within the YOLOv8 algorithm. Owing to the 2-association scheme that has been designed for object tracking in ByteTrack, referring to the tracklet from the previous frame could recover missed segmentations that come with too low confidence values. The findings identify that the Modified model of incorporating ByteTrack with YOLOv8 could improve the F1-score from 0.89 to 0.92, which outperformed all of the previous studies on the ROBUST-MIS 2019 Challenge, and from 0.82 to 0.88 on the blinded captured dataset from live streaming videos with a real-time segmentation speed of approximately 45 FPS (Frames Per Second), which is sufficient for a real-time application as opposed to 60 FPS from only the YOLOv8 algorithm. From the instrument gestures result analysis, ByteTrack could improve the segmentation performance in all gesture categories: separating, crossing, and overlapping. However, the remaining segmentation failures mostly lie in crossing and overlapping gestures.

Keywords