Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Accurate pose estimation of surgical instruments is crucial for analyzing robotic surgery videos using computer vision techniques. However, the scarcity of suitable public datasets poses a challenge in this regard. To address this issue, we have developed a new private dataset extracted from real gastric cancer surgery videos. The primary objective of our research is to develop a more sophisticated pose estimation algorithm for surgical instruments using this private dataset. Additionally, we introduce a novel loss function aimed at enhancing the accuracy of pose estimation, with a specific emphasis on minimizing root mean squared error. Leveraging the YOLOv8 model, our approach significantly outperforms existing methods and state-of-the-art techniques, thanks to the enhanced occlusion-aware loss function. These findings hold promise for improving the precision and safety of robotic-assisted surgeries.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0338_paper.pdf

SharedIt Link: https://rdcu.be/dV5yA

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72089-5_60

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Par_Towards_MICCAI2024,
        author = { Park, Jihun and Hong, Jiuk and Yoon, Jihun and Park, Bokyung and Choi, Min-Kook and Jung, Heechul},
        title = { { Towards Precise Pose Estimation in Robotic Surgery: Introducing Occlusion-Aware Loss } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {639 -- 648}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper introduces a new private dataset extracted from real gastric cancer surgery videos. A new loss function is proposed to improve the accuracy of pose estimation. Extensive experiments are conducted to show the effectiveness of the proposed method.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The dataset developed in this work is meaningful for the research area. The proposed loss function can improve the performance of YOLOv8-n, m, s, and l.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Do the authors plan to make their dataset public?
2. The proposed method is weak, mainly based on YOLOv8.
3. The proposed loss function cannot bring benefits for YOLOv8-x; it even becomes worse.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

Are the authors prepared to make their dataset and code publicly available? The authors should have explored in depth why the proposed LOSS has decreased in performance instead of improving on YOLOv8-x.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Reject — could be rejected, dependent on rebuttal (3)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The novelty of the proposed method is weak. The proposed loss function cannot bring benefits for YOLOv8-x; it even becomes worse. The authors do not explore the reason for this phenomenon.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The author present a paper addressing the accurate pose estimation in robotic surgery. The authors primarily aim to address existing challenges due to the lack of suitable public datasets. To mitigate this, the authors introduce a new private dataset compiled from real gastric cancer surgery videos. They propose a novel pose estimation algorithm leveraging this dataset, improving accuracy of estimating by incorporating an occlusion-aware loss function that optimizes the root mean squared error. Using YOLOv8, the proposed method outperforms existing techniques, showing better performance particularly in scenarios with occlusion.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors introduced modifications to the YOLOv8 model show significantly improved RMSE performance, particularly under occlusion scenarios. These improvements indicate robustness in pose estimation that could be useful for the precision and safety of robotic-assisted surgeries and for training robot imitation learning models. It does indeed seem from the figures that the tracking is better under occlusions which is quite impressive. The paper is well motivated, the authors collect their own youtube data with 125k images that are segmented into different frames. While the dataset size is pretty small, and was only collected from 10 patients on one type of surgery, this seemed to be sufficient for their experiments but might not have a large impact on other people’s work. It is unclear if authors release the dataset publicly. Overall, the paper is well motivated, well written, and did not have fatal drawbacks, depending on if the authors are willing to share their model and data.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Major critiques: • I hope that the authors also intend to open source the model that was trained in this work, because this would also be quite important for understanding what the impact of this work could be. If everything is kept private that would significantly reduce the contributions. • The dataset is derived from a specific type of surgical procedure (gastric cancer surgery), which may limit the generalizability of the algorithm to other types of surgeries or variations in surgical environments. • The authors predominantly focus on the YOLOv8 model without extensive comparison to other state-of-the-art deep learning frameworks which might have offered different insights. Their models seems to perform well overall so this is not a major issue, but would be nice to see a clearer comparison • The specific settings under which the models were trained and tested are not exhaustively detailed, which may impact the reproducibility of the results by other researchers looking to validate or extend the findings.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

I hope that the authors also intend to open source the model that was trained in this work, because this would also be quite important for understanding what the impact of this work could be. If everything is kept private that would significantly reduce the contributions.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

See above
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Strong paper, but impact depends a lot on availability of dataset.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Weak Accept — could be accepted, dependent on rebuttal (4)
[Post rebuttal] Please justify your decision

Addressed open-source issue

Review #3

Please describe the contribution of the paper

This paper introduces a new occlusion-aware loss function designed to enhance the training of neural networks for keypoint detection on surgical tools. The proposed loss function predicts the visibility of each keypoint by categorizing them into visible, occluded, and non-visible groups. Leveraging the YOLOv8 model for keypoint detection, the authors demonstrate the effectiveness of their approach on a privately collected dataset. The results indicate a notable improvement in performance compared to existing state-of-the-art methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper introduces a novel occlusion-aware loss function that aids in training neural networks for keypoint detection on surgical tools. The authors provide a comprehensive ablation study on this loss function, which demonstrates promising results in improving keypoint detection accuracy. Additionally, the paper is well-written and structured, making it easy to follow.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

While this paper effectively addresses keypoint detection on surgical tools using a novel occlusion-aware loss function, its focus is confined to the detection stage alone. Keypoint detection is a critical component of tool pose estimation, but it is only one part of a more complex process. The paper does not extend its investigation to the entire pipeline of pose estimation, leaving it unclear whether the proposed loss function can enhance the overall performance of tool pose estimation systems. Additionally, the evaluation relies on a dataset with only 2D ground truth. To convincingly demonstrate the effectiveness of the proposed approach, a 3D evaluation is desired.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
1. The use of “pose estimation” in the paper is misleading, as the research primarily focuses on 2D keypoint detection rather than full 3D localization, which is typically implied by pose estimation. This distinction is crucial and should be clarified to avoid confusion among readers about the scope and objectives of the research.
2. Figure 2 should be mentioned in the text.
3. The C2F should be introduced before been used as abbreviation.
4. What is i,j in the input of L_pose? What is \delta representing?
5. It is observed that YOLOv8-large achieved the best performance with the occlusion-aware loss, contrary to the typically expected performance scaling with network size. The performance of YOLOv8-x is worse with the occlusion loss. It would be good to provide insight into these counterintuitive results through a discussion of network capacity, overfitting, etc…
6. The authors should also discuss the limitations or potential failure cases of the proposed method.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper proposed a novel occlusion-aware loss and demonstrated the effectiveness of the loss function in terms of keypoint detection on surgical tools. The contribution of this paper is clear and the experiment is convincing. Therefore, I think the paper is worth including after the authors clarify my concerns.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Accept — should be accepted, independent of rebuttal (5)
[Post rebuttal] Please justify your decision

I am increasing my rating as the author addressed my previous concerns.

Author Feedback

R1

Open source In this paper, we have included a detailed methodology for constructing the dataset used in our study. By doing so, we provide a methodology that allows other researchers to build and utilize similar data under similar conditions. This approach can enhance the reproducibility and scalability of research. Furthermore, we plan to make the algorithms and models developed in this research public. This will allow other researchers to verify our results and apply them to their own datasets.

A specific type of surgical procedure We collected data from an institution specializing in gastric cancer surgery for this study, and we plan to extend this to other types of surgeries. While it is challenging to gather other surgeries datasets due to the need for cooperation from hospitals, we are actively working on it.

YOLOv8 YOLOv8 is one of the latest technologies in object detection and pose estimation, offering high processing speed and accuracy. This model is particularly suitable for real-time surgical video processing, proving extremely useful in the field of robotic surgery where rapid feedback in complex surgical environments is essential. Additional model comparisons can be selectively conducted based on the scope and objectives of the research, which can be further explored in future studies.

Reproducibility Thank you for your feedback. To address concerns about reproducibility, we plan to make the source code publicly available. This will allow other researchers to validate and extend our findings, ensuring transparency and facilitating further advancements in this field.

R3

Pose estimation The term “pose estimation” was used as a general term encompassing the broad task of estimating the position and orientation of objects within an image, which we intended to cover. Specifically, 2D keypoint detection is also considered to fall under this category. This usage of the term is also common in the field of medical imaging analysis. For example, in the research by Du, Xiaofei et al., titled “Articulated multi-instrument 2-D pose estimation using fully convolutional networks”, the term ‘pose estimation’ is used in a 2D context to accurately estimate the keypoints of surgical instruments. We intended to follow this academic convention.

Figure 2 The omission of Figure 2 in the text was an oversight. This figure contains examples that illustrate the partial configurations of the instruments. We will make the revisions in the camera-ready version.

C2F According to Ultralytics, the C2F module in YOLOv8, an enhancement from the previous C3 module used in YOLOv5, is a significant architectural upgrade aimed at improving feature extraction and processing efficiency. We will include the details in the camera-ready version.

L_pose loss In the L_pose function, i and j represent the indices of keypoints and related data points. The \delta is used to include values in the calculation only when a specific condition (e.g., visibility) is true. We will include it in the camera-ready version.

YOLOv8-x The YOLOv8-x model in this study, with 69.4M parameters, is large and requires substantial data to perform well. Our dataset of 125,467 images, including 83,252 for training, may not fully prevent overfitting. However, the YOLOv8-l model with our proposed loss outperformed the YOLOv8-x, confirming our loss’s effectiveness. Future work will focus on expanding the dataset and employing more data augmentation to improve generalization.

Limitation One of the primary limitations of the our method is the use of a dataset specialized for a specific type of surgery, particularly gastric cancer surgery. This may limit the algorithm’s ability to generalize across different types of surgeries or various surgical environments. We will include the limitations in the camera-ready version.

R4

Open source and public dataset Please see R1. 1.

YOLOv8 Please see R1. 3.

YOLOv8-x Please see R3. 5.

Meta-Review

Meta-review #1

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The rebuttal from the authors have addressed the reviewers’ questions. Both of the reviewers who recommended an acceptance have looked at the rebuttal and being satisfied. Although the remaining reviewer who was negative did not update his/her review, I recommend an acceptance for this paper given that this submission is a merit contribution. Please also address R4’s questions in the final version.
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

The rebuttal from the authors have addressed the reviewers’ questions. Both of the reviewers who recommended an acceptance have looked at the rebuttal and being satisfied. Although the remaining reviewer who was negative did not update his/her review, I recommend an acceptance for this paper given that this submission is a merit contribution. Please also address R4’s questions in the final version.

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

N/A

back to top

Towards Precise Pose Estimation in Robotic Surgery: Introducing Occlusion-Aware Loss

Author(s):