Abstract

Surgical phase recognition has gained significant attention due to its potential to offer solutions to numerous demands of the modern operating room. However, most existing methods concentrate on minimally invasive surgery (MIS), leaving surgical phase recognition for open surgery understudied. This discrepancy is primarily attributed to the scarcity of publicly available open surgery video datasets for surgical phase recognition. To address this issue, we introduce a new egocentric open surgery video dataset for phase recognition, named Egosurgery-Phase. This dataset comprises 15 hours of real open surgery videos spanning 9 distinct surgical phases all captured using an egocentric camera attached to the surgeon’s head. In addition to video, the Egosurgery-Phase offers eye gaze. As far as we know, it is the first real open surgery video dataset for surgical phase recognition publicly available. Furthermore, inspired by the notable success of masked autoencoders (MAEs) in video understanding tasks (e.g., action recognition), we propose a gaze-guided masked autoencoder (GGMAE). Considering the regions where surgeons’ gaze focuses are often critical for surgical phase recognition (e.g., surgical field), in our GGMAE, the gaze information acts as an empirical semantic richness prior to guiding the masking process, promoting better attention to semantically rich spatial regions. GGMAE significantly improves the previous state-of-the-art recognition method (6.4% in Jaccard) and the masked autoencoder-based method (3.1% in Jaccard) on Egosurgery-Phase. The dataset will be released at https://github.com/Fujiry0/EgoSurgery.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0627_paper.pdf

SharedIt Link: https://rdcu.be/dV5v8

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72089-5_18

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

https://github.com/Fujiry0/EgoSurgery

BibTex

@InProceedings{Fuj_EgoSurgeryPhase_MICCAI2024,
        author = { Fujii, Ryo and Hatano, Masashi and Saito, Hideo and Kajita, Hiroki},
        title = { { EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {187 -- 196}
}

Reviews

Review #1

Please describe the contribution of the paper

1) The author has constructed the first publicly available large-scale real egocentric open surgery dataset, EgoSurgery, for phase recognition. 2) and proposed a gaze-guided masked autoencoder (GGMAE), which incorporated gaze information as an empirical semantic richness prior for masking.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Automated analysis of surgical videos is indispensable for computer-assisted intervention. Surgical phase recognition is a fundamental component to support the advancement of next-generation intelligent surgical systems. The lack of large-scale datasets has considerably hindered the advancement towards achieving precise surgical phase recognition. The authors provide a publicly available dataset (EgoSurgery) that facilitates the advancements of open surgery phase recognition in learning-based algorithms.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Lack of innovation in GGMAE. The authors did not compare the advantages with other non-uniform masking methods, but only emphasized the differences between gaze-guided masking and random masking. The masking strategy of masking essential regions is mentioned in several references, e.g. ‘SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders’ in NeurIPS 2022 and ’MST: Masked Self-Supervised Transformer for Visual Representation’ in NeurIPS 2021. Therefore, in my opinion, the method is of limited innovation.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

no
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

In this manuscript, authors present and promise to release a real egocentric open surgery video dataset (named EgoSurgery), which is used for surgical phase recognition. The authors emphasized that this dataset is the first real open surgery video dataset for surgical phase recognition publicly available, potentially mitigating the scarcity of open surgical datasets. In addition, authors introduced a gaze-guided masked autoencoder (GGMAE), leveraging the region of surgeons’ gaze focuses as prior information. The experimental results showed that GGMAE perform better than state-of-the-art methods in this dataset. Automated analysis of surgical videos is indispensable for computer-assisted intervention. Surgical phase recognition is a fundamental component to support the advancement of next-generation intelligent surgical systems. The lack of large-scale datasets has considerably hindered the advancement towards achieving precise surgical phase recognition. The manuscript titled “ EgoSurgey: A Dataset of ….” emphasizes the importance of the provided data set, but a significant portion is written about GGMAE, which may deviate from the title. Additionally, the novelty of GGMAE is relatively shallow. My specific comments list below:

1、 Some typos should be checked and corrected, for example, “Disssection” “Clusure” in Fig.1.

2、 The title of a paper should highlight its most crucial aspect. However, in this paper, most of the portion is written about the GGMAE model, which seems unreasonable.

3、 In the Gaze-guided mask Masking Section, ‘gaze heatmap’ is mentioned several times in the paper, like ‘we propose non-uniform token sampling based on the accumulated gaze heatmap value of each token’, but it seems not to explain how to get the ‘gaze heatmap’. This part needs to be explained clearly.

4、 In the 2.2, the description of the dataset collection should be clearer. For example, ‘Expert surgeons perform the annotations based on their clinical experience and domain knowledge’, how many expert surgeons were involved in the annotations of the data, and how was it resolved when there were differences of opinion on the annotations?

5、 Is there a data screening process? In the 2.2, ‘The 22 pre-processed videos of open surgery are manually annotated into 9 phases: ……’ but ‘We use 14 videos for the training set, 2 videos for the validation set, and 5 videos for the test set.’ It shows that there are 21 (14+2+5) videos in the data splitting, rather than 22.

6、 If the dataset is tiny for deep learning applications, with only 14 videos for training?

7、 Lack of innovation in GGMAE. The authors did not compare the advantages with other non-uniform masking methods, but only emphasized the differences between gaze-guided masking and random masking. The masking strategy of masking essential regions is mentioned in several references, e.g. ‘SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders’ in NeurIPS 2022 and ’MST: Masked Self-Supervised Transformer for Visual Representation’ in NeurIPS 2021. Therefore, in my opinion, the method is of limited innovation.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Reject — could be rejected, dependent on rebuttal (3)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall, the content of this paper is more focused on presenting the GGMAE model than the EgoSurgery dataset mentioned in the title. Besides, the masking method of GGMAE lacks innovation in masking informative regions of the input.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This study introduces EgoSurgery, the first-ever egocentric video dataset designed specifically for surgical phase recognition. Alongside the dataset, the authors propose a novel neural network architecture called GGMAE (gaze-guided masked autoencoder). GGMAE leverages gaze data to focus on crucial areas within the video frames, potentially leading to improved recognition of surgical phases.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The creation of EgoSurgery, a publicly available dataset for open surgery videos, offers a valuable resource for the surgical research community. The study compares the performance of GGMAE against existing state-of-the-art models for surgical phase recognition.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper needs more clarity regarding the complexity of the data used in the study. For example, the skill levels of the surgeon and information about different open surgeries conducted need to be included.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

Please provide the skill levels of surgeons and information about different open surgies included in the dataset.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This study introduces EgoSurgery, the first-ever egocentric video dataset designed specifically for surgical phase recognition. Alongside the dataset, the authors propose a novel neural network architecture called GGMAE (gaze-guided masked autoencoder). GGMAE leverages gaze data to focus on crucial areas within the video frames, potentially leading to improved recognition of surgical phases.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

This paper develops a EgoSurgery dataset, and a deep learning model GGMAR for surgical phase recognition. The dataset is publicly available and the model design shows novelty.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

• An open dataset for surgical phase recognition • The use of gaze heatmap for generating gaze-guided mask to train GGMAR
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

• Dring the evaluation of the model, cross-validation is needed to better show the performance. • The dataset contains data from multiple phases of surgical. In the evaluation, the performance of the proposed GGMAR for different phases is needed. • In the dataset, there is a data distribution problem. For example, the number of disinfection frames is 50 times less than dissection frames. Any data augmentation steps?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

No
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

• It will be helpful to provide runtime of the proposed model.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This word provides a public dataset and a novel method for surgical phase recognition. The proposed GGMAR method use gaze-guided mask to better guide the training. This paper has good quality despite some of the minor weaknesses.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Author Feedback

We thank all the reviewers, R1, R3, and R4 for their insightful and constructive comments. Dataset details (R1 W1,C1, R3 C2): The type of surgery for each procedure and the surgeon’s years of experience will be provided as meta-information of the dataset. The analysis of meta-information and the details of dataset collection will be included in a future extension version of the paper. Comparison with other non-uniform masking methods (R3 W1,C7): We compare our model with SurgMAE, which adopts non-uniform masking to sample tokens from high spatio-temporal regions, achieving state-of-the-art phase recognition performance on the Cataract-101 dataset. As the papers mentioned by Reviewer 3 refer to are image-based MAE methods, these cannot be directly applied to video-based phase recognition. To avoid misunderstandings, these papers are added to the citations as examples of non-uniform masking approaches. Gaze heatmap (R3 C3): The gaze heatmap is generated from the gaze data recorded by Tobii. This gaze data will also be released as part of EgoSurgery. Dataset size (R3 C6) As mentioned in the conclusion and future work section, we intend to enrich this dataset by augmenting the video content and incorporating footage captured from various perspectives to advance the automated analysis of open surgery videos. More experimental settings and results (R4 W1) The experimental results on the cross-validation setting and the performance of our model for each phase will be conducted in the future extension version of the paper. Runtime (R4 WC1) The GFLOPs of our model are 16.2. Data augmentation (R4 W1) We employ spatial data augmentation techniques, such as rotation, horizontal flip, and vertical flip. In addition, to handle the class imbalance, we adopt re-sampling strategies. Specifically, we use the ImbalancedDatasetSampler, which rebalances the class distributions when sampling from the imbalanced dataset.

Meta-Review

Meta-review not available, early accepted paper.

back to top

EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos

Author(s):