List of Papers Browse by Subject Areas Author List
Abstract
Laparoscopic liver surgery poses a complex intraoperative dynamic environment for surgeons, where remains a significant challenge to distinguish critical or even hidden structures inside the liver.
Liver anatomical landmarks, e.g., ridge and ligament, serve as important markers for 2D-3D alignment, which can significantly enhance the spatial perception of surgeons for precise surgery. To facilitate the detection of laparoscopic liver landmarks, we collect a novel dataset called L3D, which comprises 1,152 frames with elaborated landmark annotations from surgical videos of 39 patients across two medical sites. For benchmarking purposes, 12 mainstream detection methods are selected and comprehensively evaluated on L3D. Further, we propose a depth-driven geometric prompt learning network, namely D2GPLand. Specifically, we design a Depth-aware Prompt Embedding (DPE) module that is guided by self-supervised prompts and generates semantically relevant geometric information with the benefit of global depth cues extracted from SAM-based features. Additionally, a Semantic-specific Geometric Augmentation (SGA) scheme is introduced to efficiently merge RGB-D spatial and geometric information through reverse anatomic perception. The experimental results indicate that D2GPLand obtains state-of-the-art performance on L3D, with 63.52% DICE and 48.68% IoU scores. Together with 2D-3D fusion technology, our method can directly provide the surgeon with intuitive guidance information in laparoscopic scenarios. Our code and dataset are available at https://github.com/PJLallen/D2GPLand.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0310_paper.pdf
SharedIt Link: pending
SpringerLink (DOI): pending
Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0310_supp.pdf
Link to the Code Repository
https://github.com/PJLallen/D2GPLand
Link to the Dataset(s)
https://drive.google.com/drive/folders/1jP4m7_0oP6-srTknS5NAp0Dr8gzkydrI?usp=sharing
BibTex
@InProceedings{Pei_DepthDriven_MICCAI2024,
author = { Pei, Jialun and Cui, Ruize and Li, Yaoqian and Si, Weixin and Qin, Jing and Heng, Pheng-Ann},
title = { { Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark Detection } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
year = {2024},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15006},
month = {October},
page = {pending}
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper describes a very interesting work. The paper has proposed a model based on large model SAM for liver landmark detection on key frames of laparoscopic videos. And The paper has proposed a Depth-aware Prompt Embedding (DPE) module to utilize the features extracted from SAM. Even more gratifying is that the paper has also collected and annotated a dataset for detecting the laparoscopic liver landmarks. On the dataset, the method proposed has achieved better performance than other mainstream methods.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper created a new dataset for liver marker detection. And based on the dataset, the proposed method achieved the best performance.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- One of the innovative points of the paper is the proposal of the DPE module. And the paper has mentioned that the use of three class-specific geometric prompts and the DPE module to jointly guide the extraction of geometric information. How are these three geometric prompts obtained? Is this the core part of the DPE module?
- The author mentioned selecting 1146 keyframes from 1500 frames, whether it was done by a doctor or the author’s team. What principles were followed in the selection process?
- In section 3.3, it has mentioned that the balancing parameters in the loss function are set to 1 to achieve optimal performance. Have other parameters been used, or have the balance parameters been set to learnable parameters?
- In this paper, experiments have been conducted on the new dataset proposed in the paper and superior performance has been achieved. Are there comparative experiments conducted on a publicly available dataset? What are the experimental results? Conducting experiments only on the dataset proposed does not fully demonstrate the effectiveness of the proposed method.
- Some problems about the format. 1) In Fig. 2, the icons (a), (b), (c) and (d) are not clear and difficult to understand. 2) In the description of the loss function section, the formula is written in the main text, and it is considered to list the formula separately.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Do you have any additional comments regarding the paper’s reproducibility?
N/A
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
The author should explain in more detail the information of the proposed DPE module, the details of selecting and dividing datasets, and the principles of using hyper parameters in the loss function. And if there are public datasets, the author should verify the proposed method on them.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Weak Reject — could be rejected, dependent on rebuttal (3)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper presents a novel dataset and verifies the method based on it, which really makes a contribution. But the paper has no detailed explanation in many places mentioned before. Thus the author needs to supplement some more detailed information. And it is not enough to explain the effectiveness of the proposed method only on the dataset proposed in the paper.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #2
- Please describe the contribution of the paper
- Author designs a prompt-based framework which combine RGB and corresponding Depth information for Laparoscopic Liver Landmark Segmentation.
- The contrastive learning approach is applied to enhance the prompt representations of Silhouette, Falciform ligament and Ridge.
- Create a new liver landmark segmentation dataset.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Introducing depth information to landmarks segmentation task enhances the geometric representations of each classes.
- The proposed Semantic-specific Geometry Augmentation is novel.
- Dataset construction is helpful to the community and further experiments.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The generation of landmark prompts is not clear.
- Why not use the average reference embedding of each class in training set directly? Can the author do an ablation study between contrastive learning-based prompts and purely average reference embedding?
- In fig. 2(a), the symbol circle with dot should be element-wised multiplication.
- In comparison study, the author should show the number of parameters used for inference, instead of tuned parameters.
- For CAI task, the inference time and FLOPs estimation is required for real-time application.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Do you have any additional comments regarding the paper’s reproducibility?
N/A
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
- The generation of landmark prompts is not clear. How to initialize these prompt at the beginning of the training?
- The contrastive learning is used to enable the trained prompt embeddings to have high similarity to reference embedding of the same class. Why not use the average reference embedding of each class in training set directly? Can the author do an ablation study between contrastive learning-based prompts and purely average reference embedding?
- In fig. 2(a), the symbol circle with dot should be element-wised multiplication.
- In comparison study, the author should show the number of parameters used for inference, instead of tuned parameters. Because author uses frozen pre-trained monodepth model and SAM encoder for depth map generation and depth map encoding.
- For CAI task, the inference time and FLOPs estimation is required for real-time application.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Weak Accept — could be accepted, dependent on rebuttal (4)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- Need more explanations of DPE module.
- Need one ablation study to prove the advantage of contrastive learning for prompt representation.
- Author combines monodepth model, SAM encoder and Segmentation auto-encoder for this landmark segmentation task, therefore the inference time and FLOPs estimation is needed to prove that this model is capable for real time CAI application.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Accept — should be accepted, independent of rebuttal (5)
- [Post rebuttal] Please justify your decision
- Solve all my questions about explanations of methodology.
- In the rebuttal, author clarified that current proposed model is not capable for real-time CAI application and clinician just needs key frames (said that they will make the model capable for real-time CAI application in the future). Based on their rebuttal, the score is changed to Accept.
Review #3
- Please describe the contribution of the paper
Addressing the challenge of liver landmark detection for laparoscopic surgeries, the authors approach the problem by proposing a new dataset L3D and a novel architecture. The authors propose a depth-driven geometric prompt learning network called D^2GPLand with a Depth-Aware Prompt Embedding module. The authors include an evaluation of 12 existing approaches on the L3D dataset. The D^2GPLand shows optimal performance on the L3D dataset. Ablation studies of the design choices for the network have been included.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is well written and provides sufficient details about the training and dataset generated for identifying landmarks on the liver (Silhouette, falciform ligament and ridge).
- The dataset of 1146 frames will be made publicly available and will benefit the community.
- The codebase for the proposed architecture will also be made available.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The authors identify that SAM-based models show better performance than non-SAM models. Interestingly, in the ablation studies for the proposed network, the SAM(RGB)+CNN(Depth) backbone and CNN(RGB)+SAM(Depth) backbone show similar performance. Given the number of parameters for the chosen non-SAM approaches are fewer than the SAM-based models, this claim may not necessarily be true.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Do you have any additional comments regarding the paper’s reproducibility?
N/A
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
Page 5, line 3: Modal -> model
It is unclear if the models compared to the proposed approach used the hyper parameters for training similar to D^2GPLand or were trained to convergence. Clarifying this would be helpful.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Strong Accept — must be accepted due to excellence (6)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper is very well written and includes sufficient details for reproducibility. Additionally, the different components proposed and evaluated are good contributions to MICCAI
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Author Feedback
We appreciate valuable and constructive comments from all reviewers and we are encouraged by the positive comments including “novel (R6) and interesting (R3)”, the constructed datasets “benefit the community (R4&R6)”, and our method achieves “optimal performance (R3&R4)”. Below, we address specific concerns.
- Model Design: DPE module explanation (R3&R6). In our work, three class-specific geometric prompts are learnable, i.e., they are randomly initialized and supervised by segmentation and contrastive loss. The core part of DPE is that we utilize deep features and learnable prompts to obtain the class-activated geometric features and enhance prompt discriminativeness. We will provide more explanations about the DPE initialization in the final version. Balancing parameters (R3). We have previously ablated other values and applied the optimal ones. Due to space constraints, we display only more important ablation experiments in the manuscript. Indeed, they can be set to be learnable, and we will conduct the ablation study in the journal version. Reference embeddings (R6). In our proposed L3D dataset, there exists significant inter-frame variation, such as different angles of view and numbers of landmarks. In principle, average reference embedding is less specific and accurate than contrastive learning-based prompts whose reference embeddings are tailored for individual samples. We will consider conducting this ablation study in the journal version.
- Dataset: Keyframe selection (R3). We have claimed this concern in Section 2, Line 6, four surgeons are invited to select and label keyframes, and the selection criterion for the keyframes is to allow the surgeon to observe the global view of the liver. Dataset selection & division (R3). We have surveyed related datasets. Since we follow the task setting that defines laparoscopic liver landmark detection as a semantic-specific segmentation task (Refer to MICCAI-2022 P2ILF Challenge [1]), there is no public dataset for this setting so far. In this regard, we contribute a new liver landmark benchmark to foster community development. We have claimed the data division on Page 4, Line 7, ‘We divide all samples in L3D into three sets, where 921 images are used as the training set, 126 images as the validation set, and 109 images as the test set’. [1] Ali S, et al. An objective comparison of methods for augmented reality in laparoscopic liver resection by preoperative-to-intraoperative image fusion. arXiv preprint arXiv:2401.15753, 2024.
- Experiments: Model parameters (R6). Thanks for your advice and we will correct it. Notably, the criterion used to calculate the number of tuned parameters in Table 2 is consistent and fair across all methods. Real-time application (R6). According to the discussion with our cooperating surgeons, they only need guidance for keyframes for decision-making, while our method can sufficiently fulfill the requirements of surgeons. Considering the real-time setting is important for CAI applications, we will strive to achieve real-time landmark detection in the follow-up. Model training (R4). All compared models were trained to converge with their official implementations. We will clarify this in the revision. Backbone performance (R4). Thanks for your advice. We will improve our claim and discuss this in the revision.
- Other questions: Format problems. We thank all reviewers for pointing out the format, spelling, and symbol problems (R3&R4&R6). We will correct and improve them in the final version. We have promised to release the code and dataset upon acceptance (R3).
Meta-Review
Meta-review #1
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
The paper introduces an interesting model for liver landmark detection in laparoscopic videos, utilizing a Depth-Aware Prompt Embedding module. The authors created and annotated a new dataset for this task that will be made available to the community.
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
The paper introduces an interesting model for liver landmark detection in laparoscopic videos, utilizing a Depth-Aware Prompt Embedding module. The authors created and annotated a new dataset for this task that will be made available to the community.