Laparoscopic Video Analysis Using Temporal, Attention, and Multi-Feature Fusion Based-Approaches

Nour Aldeen Jalal; Tamer Abdulbaki Alshirbaji; Paul David Docherty; Herag Arabian; Bernhard Laufer; Sabine Krueger-Ziolek; Thomas Neumuth; Knut Moeller

doi:10.3390/s23041958

Sensors (Feb 2023)

Laparoscopic Video Analysis Using Temporal, Attention, and Multi-Feature Fusion Based-Approaches

Nour Aldeen Jalal,
Tamer Abdulbaki Alshirbaji,
Paul David Docherty,
Herag Arabian,
Bernhard Laufer,
Sabine Krueger-Ziolek,
Thomas Neumuth,
Knut Moeller

Affiliations

Nour Aldeen Jalal: Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
Tamer Abdulbaki Alshirbaji: Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
Paul David Docherty: Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
Herag Arabian: Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
Bernhard Laufer: Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
Sabine Krueger-Ziolek: Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany
Thomas Neumuth: Innovation Center Computer Assisted Surgery (ICCAS), University of Leipzig, 04103 Leipzig, Germany
Knut Moeller: Institute of Technical Medicine (ITeM), Furtwangen University, 78054 Villingen-Schwenningen, Germany

DOI: https://doi.org/10.3390/s23041958
Journal volume & issue: Vol. 23, no. 4
p. 1958

Abstract

Read online

Adapting intelligent context-aware systems (CAS) to future operating rooms (OR) aims to improve situational awareness and provide surgical decision support systems to medical teams. CAS analyzes data streams from available devices during surgery and communicates real-time knowledge to clinicians. Indeed, recent advances in computer vision and machine learning, particularly deep learning, paved the way for extensive research to develop CAS. In this work, a deep learning approach for analyzing laparoscopic videos for surgical phase recognition, tool classification, and weakly-supervised tool localization in laparoscopic videos was proposed. The ResNet-50 convolutional neural network (CNN) architecture was adapted by adding attention modules and fusing features from multiple stages to generate better-focused, generalized, and well-representative features. Then, a multi-map convolutional layer followed by tool-wise and spatial pooling operations was utilized to perform tool localization and generate tool presence confidences. Finally, the long short-term memory (LSTM) network was employed to model temporal information and perform tool classification and phase recognition. The proposed approach was evaluated on the Cholec80 dataset. The experimental results (i.e., 88.5% and 89.0% mean precision and recall for phase recognition, respectively, 95.6% mean average precision for tool presence detection, and a 70.1% F1-score for tool localization) demonstrated the ability of the model to learn discriminative features for all tasks. The performances revealed the importance of integrating attention modules and multi-stage feature fusion for more robust and precise detection of surgical phases and tools.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords