Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Segment Anything Model (SAM) has been widely used in common medical image segmentation for its great zero-shot generalization by providing points or box as prompt. However, we find that SAM and its variants do not cope well with complex fine-grained segmentation tasks such as kidney anatomical structure segmentation due to the discrepancy between the model’s interpretation of the task and the actual intent conveyed by the prompts. This paper introduces a new approach called Knowledge SAM (KSAM). By providing a pair of example image and corresponding fine-grained segmentation mask as the knowledge prompt, model can utilize the contextual information to better understand the meaning of the unseen fine-grained segmentation task. To accommodate knowledge prompts, we design two modules specifically designed for knowledge prompt feature fusion. KSAM outperforms the SAM models based on different prompts across both our proposed kidney anatomical structure dataset and REFUGE. Notably, our approach demonstrates competitive performance while offering better extensibility on new tasks compared with prompt-free methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2804_paper.pdf

SharedIt Link: https://rdcu.be/eHwLb

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04927-8_27

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZhaHen_Knowledge_MICCAI2025,
        author = { Zhang, Hengyuan AND Qiao, Peng AND Li, Wenyu AND Jia, Yan AND Dou, Yong},
        title = { { Knowledge Bridges the Intent Gap: Contextual Fusion in Medical Fine-Grained Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {280 -- 290}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper introduces a new SAM variant called KSAM to transfer the traditional SAM to the unseen fine-grained segmentation task by providing a pair of example image and corresponding fine-grained segmentation mask. KSAM includes two modules designed for knowledge prompt feature fusion. The experiment results on the kidney and fundus datasets demonstrate the effectiveness of the proposed KSAM and each component.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Interesting motivation and well-drawn illustrations.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Unclear methodology. The supervision of KSAM is unclear, and it is not reflected in the text or figure. Except for knowledge prompt, please explain whether KSAM requires any additional fine-grained segmentation labels. Moreover, almost all the abbreviations lack corresponding full names, which makes the method section hard to read.
2. The details of data splits for both KAS and REFUGE datasets are unclear.
3. Unfair comparison. In Table 1, the authors categorize KSAM, PerSAM, and OnePrompt as instance prompt methods. However, PerSAM and OnePrompt do not require additional fine-tuning, while KSAM needs. In the traditional methods of Table 1, PANet is designed for few shot, and the reproduction results of PANet seems to be wrong, especially on the REFUGE dataset.
4. The forgetting curve shown in Fig. 6 illustrates the model’s performance degradation on the old task when fine-tuned on a new task. This figure does not provide sufficient evidence to support the claim about the scalability of KSAM.
5. The reported results in Table 1 for traditional methods on the REFUGE dataset are not representative. According to the REFUGE challenge paper [1], Dice scores above 0.90 for the optic disc and above 0.80 for the optic cup are easily attainable. The authors should compare against more reliable methods.
[1] Orlando J I, Fu H, Breda J B, et al. Refuge challenge: A unified framework for evaluating automated methods for glaucoma assessment from fundus photographs[J]. Medical image analysis, 2020, 59: 101570. Typo: The outputs in Eq. (1) are denoted as DE1 and DE2, but they are referred to as DE0 and DE1 in Subsection 2.2.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Unclear methodology and unfair comparison. For more details, please refer to the major weaknesses.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

I have to reject this paper since there are still unresolved issues, especially the unclear methodology and the unfair comparison.

Review #2

Please describe the contribution of the paper

This paper presents a novel method called Knowledge SAM (KSAM). It addresses the task confusion associated with point prompts and box prompts in Vanilla SAM by transforming existing annotated knowledge into prompts. This approach avoids the instability of prediction results caused by inappropriate prompts and reduces the large amount of data required for overall fine-tuning.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1.The starting point of this paper is excellent. It primarily addresses the uncertainty associated with SAM prompts and the large amount of data required for fine-tuning. To tackle these challenges, the study employs a knowledge prompting method called “knowledge prompt” to resolve issues related to obtaining high-quality prompts and prompt ambiguity.

2.The approach to knowledge fusion is intriguing. The authors propose two modules to achieve knowledge integration, enabling the model to better understand the definition of segmentation tasks and perform segmentation without uncertainty based on the provided knowledge prompts. This method offers superior scalability and competitive performance compared to prompt-free approaches.

3.The abstract and introduction in the paper are comprehensive, and the figures are visually appealing, clearly articulating the current issues in the field.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Uncertainty of knowledge prompt dataset size: The novel approach of using image and mask pairs as knowledge prompts is intriguing; however, the specific construction details of these prompts remain unclear. In Section 2.1, the assertion that “such an approach contains enough medical prior knowledge to guide the model” lacks clarity regarding what constitutes “enough.” There are no descriptions or ablation studies provided to elucidate this aspect.
2. lack of detailed explanation on architecture and data flow: The architecture and data flow diagram does not offer sufficient explanations. While the Vanilla SAM’s Image Encoder is designed to support a single image with a set of prompts, Figure 4’s upper left corner shows both the target image and knowledge prompt image being fed into the Image Encoder. It is unclear whether this represents a staged training process or if both images are input within the same iteration, sharing the Image Encoder to produce two embeddings. If it is the latter, is there a specific matching relationship between the target image and the knowledge prompt during this iteration, or are they randomly loaded from the data loader? Furthermore, the meanings of the gray solid and dashed lines in the diagram should be explicitly annotated.
3. Unknown influence of knowledge prompts on inference results: Based on the framework diagram, it appears that knowledge prompts are utilized to optimize the embeddings produced by the SAM Image Encoder using existing annotation priors. However, the impact of varying annotation priors on the final results remains ambiguous. During inference, the effect of applying different image-mask pairs as knowledge prompts when a target image is presented is uncertain. This point deserves emphasis, as the principal motivation of the paper revolves around the significant influence of prompts on SAM.
4. Discussion on time and efficiency: In comparison to Vanilla SAM, KSAM’s prompts are substantially more complex. The paper should elaborate on the time complexity associated with KSAM to provide a clearer understanding of its efficiency.
5. Diversity of ablation studies: KSAM builds on Vanilla SAM, primarily utilizing the Image Encoder while omitting the prompt Encoder. Many related works involve retraining a decoder using the embeddings obtained from the Image Encoder. The key distinctions of KSAM include the integration of masks within prompts and the addition of LoRA layers. It would be pertinent to explore whether setting the mask to all black or introducing noise into the mask affects model performance. Additionally, what would be the impact of removing LoRA layers on the model?
6. Minor comments: The term “unbiased” in “An unbiased knowledge prompt method” in Section 1 is not clearly defined. In Equation 3, “Sigmoid” is a function name (similar to “sin” or “log”), so it should be presented in an upright font rather than italicized in the equation.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

There are many related works on segmentation based on optimizing prompts that the authors have mentioned in the paper. Additionally, there are several other interesting works, such as: Zhou C, Ning K, Shen Q, et al. Sam-sp: Self-prompting makes sam great again[J]. arXiv preprint arXiv:2408.12364, 2024. Wahd A S, Felfeliyan B, Zhou Y, et al. Sam2Rad: A segmentation model for medical images with learnable prompts[J]. Computers in Biology and Medicine, 2025, 187: 109725. Mansoori M, Shahabodini S, Abouei J, et al. Self-Prompting Polyp Segmentation in Colonoscopy Using Hybrid YOLO-SAM2 Model[C]//ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025: 1-5. Zheng X, Zhang Y, Zhang H, et al. Curriculum Prompting Foundation Models for Medical Image Segmentation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024: 487-497. In comparison, using existing image annotations as prompts is somewhat akin to the idea of fine-tuning with priors. However, this approach may involve more complex data loading and experimental settings during the training or inference process. Therefore, I hope that the authors can provide more detailed technical descriptions and ablation studies to demonstrate the effectiveness of their method.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

The authors’ rebuttal partially addresses concerns regarding knowledge prompt selection and inference time. However, the influence of prompt design variations and the impact of Lora module inclusion remain unclear due to the lack of ablation experiments. Therefore, I maintain my original score.

Review #3

Please describe the contribution of the paper

This paper proposes KSAM, which leverages images and their corresponding masks as knowledge prompts to guide SAM for achieving fine-grained segmentation. The method is evaluated on both public and private datasets, achieving suboptimal yet competitive performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The background analysis is reasonable, and provide a reasonable analysis of the current challenges in the field.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Section 2.1 lacks clarity. It is recommended that the authors explicitly define what the “knowledge prompt” refers to and how it is obtained or constructed. Additionally, the term “KM” seems to stand for “Knowledge Merge Module”—please clarify this and provide a full explanation when it first appears.
2. As shown in Fig. 4, KM and RM modules are added to the original model. Please provide details on the total number of trainable parameters after these additions. Moreover, Table 1 shows that some prompt-free or point/box prompt methods still demonstrate competitive performance. This raises concerns about the motivation and necessity of the proposed approach. Additionally, some classical segmentation baselines like UNet are missing, and the listed traditional methods perform relatively poorly. It is advisable to include more representative baselines and provide an analysis of the performance gap.
3. The evaluation only includes the DSC metric. It is suggested to add spatial metrics such as HD95 to offer a more comprehensive assessment of the model’s performance.
4. The readability of the figures and tables needs improvement. Some key terms are not clearly explained, and Fig. 4 does not clearly illustrate the role and process of SAM in the overall framework.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Many essential details are missing, bringing a lot of difficulties in following this manuscript.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

This manuscript has an interesting method design and good experimental analysis. Accordingly, I suggest an acceptance to this manuscript and authors should improve the writing before the final submission.

Author Feedback

We appreciate valuable comments from all reviewers and will consider them in revised manuscript. [R2#Q1,4 R3#Q1]:We apologize for any lack of clarity in writing. The abbreviations KP, KM, and PM refer to subsections in Section 2(eg. KM represents Knowledge Merge module). We will provide explanations on key terms that appear for the first time. [R1#Q1 R2#Q1 R3#Q1]:Details of knowledge prompt. [A]As mentioned at the bottom of Page 2, A knowledge prompt consists of an image-mask pair. In each iteration, we sample a target image and GT mask (eg. Cortex Mask) and randomly choose a different image and its Cortex Mask as knowledge prompt. The size of knowledge prompt dataset is equal to training set. The image provides rich contextual semantics about the target’s surroundings as prior knowledge. Guided by the mask, KSAM accurately segments the target region by maintaining contextual awareness of adjacent anatomical structures. The fine-grained labels are only used for supervising the output mask except for knowledge prompt. [R1#Q4,5 R2#Q2]:Parameters and Efficiency. [A]The added modules introduce one-third more params than SAM. Although slightly increasing inference time, KSAM enhances efficiency by eliminating manual box prompt selection for image. By using LoRA, we outperform MedSAM while training fewer params. [R2#Q2 R3#Q5,3]: Baselines and comparisons.[A]In experimental settings, comparisons focus on the direct results with different promps based on SAM. Our comparisons include some excellent and widely used UNet variants. Without multi-stage training, complex post-processing, or additional training data, traditional methods may underperform due to limited model capacity or prompt free pattern. OnePrompt requires full retraining with weights inaccessible and PerSAM needs light finetuning to compute similarity. PerSAM calculates instance-image similarity to select the most likely point as prompt. This aligns with our goal of studying prompt effects, making PerSAM a special instance prompt. PANet operates on query/support sets and serves as a special instance-prompt method. During reproduction, we found an error in image channel processing. After correction, metric improved limitedly without affecting the conclusions. [R1#Q2]:Confusions of architecture and data flow. [A]In Fig.4, two CT images share Image Encoder to generate two embeddings. Solid gray lines show data flow, while dashed lines indicate the variables serve as input to both KM and PM Modules for simplicity. [R1#Q3,5]:Influence of knowledge prompts. [A]KSAM segments different target regions by providing different prompt masks. KSAM fails to segment properly when provided with all black mask or sufficiently strong noise that overwhelms the mask. The results are intuitive. [R2#Q2]:The necessity of knowledge prompt. [A]Prompt free methods behave poor on REFUGE. When adapting models to new tasks, the output dim should be reset to match new tasks, which leads to the change of structure, failure of reusing the former weight and need retraining. KSAM can be finetuned without any change and therefore retains the previous knowledge. For point/box prompt, they have seriously problem of ambiguous understanding shows in Fig3 and Fig5. [R2#Q3]:Evaluation metric. [A]The original results contained HD95, which was excluded to meet formatting requirements. Finally, we have to use the commonly used metric of DSC. [R2#Q4]:The role of SAM. [A]We retain the architecture of SAM and finetune the Image Encoder. [R3#Q2]:The details of data splits. [A]KAS uses a 4:1 division, while REFUGE follows the splits of the original dataset. [R3#Q4]:The scalability of KSAM. [A]Scalability refers to [R2#Q2]. According to Fig6, KSAM exceed many baselines at 25th epoch when adapting to new tasks. Although it’s far from completely migrating to a new task without forgetting, this represents an improvement. Future work will generalize this approach in a broader framework to better leverage knowledge prompts.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

Two of the reviewers still recommend Reject, although one of them seems to be “Weak Reject”. The reason they have provided, while valid, is not detailed. In my judgement, the authors have, at least partially, responded to the concerns that these two reviewers still maintain such as “unfair comparison” by Reviewer 3. I leave the decision to the Chairs, but given the ranking of the paper in my pile, I recommend acceptance because of its relatively high methodological novelty and good results.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

While the reviewers expressed some interested in the work, they also agreed that there is a lack of novelty and validation. We encourage the authors to further improve on their work in the future.

back to top

Knowledge Bridges the Intent Gap: Contextual Fusion in Medical Fine-Grained Segmentation

Author(s):