Abstract

Deep learning has made significant progress in natural image segmentation but faces challenges in medical imaging due to the limited availability of annotated data. Few-shot learning offers a solution by enabling segmentation with only a few labeled samples, yet generalization remains a challenge when data is scarce. In this work, we investigate the potential of the Segment Anything Model (SAM), a foundation model trained on over one billion annotated images, for few-shot medical image segmentation. However, SAM faces two key challenges: (1) the domain gap between natural and medical images, leading to suboptimal performance, and (2) prompt dependency, as SAM requires user-defined prompts, limiting automation. To address these issues, we propose a novel framework, named AM-SAM, that adapts SAM for few-shot medical image segmentation. Our approach introduces a medical image-specific augmentation strategy and a dual-encoder architecture to bridge the domain gap. Additionally, we develop an automated dual-prompt mechanism to eliminate prompt dependency, generating point and mask prompts from support images. Extensive experiments show that AM-SAM outperforms existing approaches by up to 3.8% on ABD-MRI and 4.0% on ABD-30 in terms of dice score metric.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3227_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{PhaCuo_Unleashing_MICCAI2025,
        author = { Pham, Cuong M. and Nguyen, Phi Le and Nguyen, Thanh Trung and Phan, Minh Hieu and Nguyen, Binh P.},
        title = { { Unleashing SAM for Few-Shot Medical Image Segmentation with Dual-Encoder and Automated Prompting } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {682 -- 692}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a novel framework named AM-SAM for few-shot medical image segmentation. The main contributions are:

    1. A dual-encoder architecture that uses both original and augmented images to bridge the domain gap between natural and medical images.
    2. An automated dual-prompt mechanism that generates point and mask prompts from support images, eliminating prompt dependency.
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A new way to explore the potential of SAM in FSMIS task.
    2. Extensive experiments show that AM-SAM significantly outperforms existing methods.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Poor Structure: Section 2.2 (Visual Encoder) lacks clear paragraphing, making the content lengthy and difficult to comprehend. Despite the detailed description of the visual encoder’s design, the lack of logical paragraphing makes it hard for readers to grasp the key points, affecting overall readability.

    2. Unconvincing Motivation for Image Augmentation: The motivation for using Mean Shift Clustering for image augmentation is not convincing. Although the algorithm can enhance the boundaries of anatomical structures, the authors fail to adequately explain why this clustering method was chosen and its unique advantages in few-shot medical image segmentation.

    3. Incomplete Method Description: The method section provides an insufficient description of the few-shot segmentation process during the inference stage, especially the details of how to generate segmentation results using support samples. Figure 2 also does not illustrate this process, making it difficult for readers to fully understand the model’s actual operation.

    4. Questionable Loss Function Selection: The article uses Dice Loss and IoU Loss as the training objectives but does not adequately explain why these two losses were chosen over the more commonly used combination of Cross-Entropy Loss (CE Loss) and Dice Loss.

    5. Insufficient Comparison with Related Work: The article does not sufficiently compare other works that use the Segment Anything Model-2 (SAM2) for few-shot medical image segmentation (FSMIS) tasks.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    please see the weakness part

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    Most of my concerns are not addressed.



Review #2

  • Please describe the contribution of the paper

    The paper addresses the challenges of medical image segmentation through a novel framework called AM-SAM, which leverages the SAM for few-shot learning. AM-SAM aims to enhance segmentation performance by introducing a dual-encoder architecture and an automated prompting mechanism that mitigates the prompt dependency of SAM. Experimental results show that AM-SAM surpasses existing state-of-the-art methods on several medical datasets.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The AM-SAM approach successfully integrates SAM into few-shot learning for medical imaging, addressing both domain gaps and prompt dependencies effectively.

    2. The inclusion of a dual-encoder that processes both original and augmented images enhances feature extraction.

    3. By implementing an automated dual-prompt mechanism to generate point and mask prompts, the authors reduce reliance on user-defined prompts.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The decision to apply two separate encoders for the original and augmented images rather than sharing a single encoder could lead to a larger model size. An ablation study to clarify the benefits of this approach versus using a shared encoder would be necessary.

    2. Although the original prompt encoder from SAM is capable of generating both point and mask prompts, the authors use two custom-designed prompt encoder. Clarifications on why this choice was made could strengthen the rationale behind the design.

    3. It is unclear whether the mask decoder used in AM-SAM is the same as that of SAM. If a new decoder was designed specifically for this framework, details on its architecture and functionality would be beneficial for understanding its enhancements and implications.

    4. While AM-SAM outperforms selected baselines, a comparative analysis of computational costs such as model size, training durations, and inference times is lacking. This information is crucial for evaluating the practicality of deployment in real-world scenarios.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    As mentioned in the weakness part.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors’ feedback addressed my confusion. I plan to adjust my final decision as “Accept”.



Review #3

  • Please describe the contribution of the paper

    This paper introduces AM-SAM, a novel framework that adapts the Segment Anything Model (SAM) for few-shot medical image segmentation using a dual-encoder architecture with Mean-Shift Clustering augmentation and an automated dual-prompt mechanism (generating both point and mask prompts from support images), effectively addressing the domain gap between natural and medical images while eliminating prompt dependency. Experimental results demonstrate superior performance over existing methods, with improvements up to 3.8% on ABD-MRI and 4.0% on ABD-30 datasets in terms of dice score metrics.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel dual-encoder with medical-specific augmentation : Combines dual-encoder architecture with Mean-Shift Clustering to enhance anatomical boundaries, bridging the domain gap between natural and medical images with 6.5-13.6% performance improvement.
    2. Automated dual-prompt generation : Eliminates SAM’s prompt dependency through automatic point and mask prompt generation from support images, enabling fully automated medical image segmentation.
    3. Efficient adaptation with strong results : Lightweight adapter design achieves superior performance (3.8-4.0% improvement) over state-of-the-art methods across multiple anatomical structures while maintaining computational efficiency.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. When using K-means Clustering for data augmentation, is the value of K fixed throughout the training process, or does it change dynamically based on the learning progress? Additionally, on what basis is the clustering performed? Please elaborate further with reference to paper [1]. [1] LEARNING-AUGMENTED k-MEANS CLUSTERING [ICRL 2022]

    2. Can we conclude that training is effective when using images like Figure 2? Could it not lead to the loss of boundary information instead? Considering that there are many types of clustering algorithms, is K-means truly the optimal choice for this context?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please refer the main weaknesses of the paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

Our detailed responses are provided below.

Shared question: Q. (Reviewer #2, #4) Why did we choose K-means for clustering? A. As stated in Sec. 2.2, “This method systematically clusters pixels into high-density regions by grouping those with similar characteristics. Consequently, at image boundaries where intensity variations are pronounced, data points become more concentrated, leading to improved delineation of anatomical structures,” (see Fig. 2). We also demonstrate the effectiveness of this method in the ablation study (Sec. 3.3).


Reviewer #2

Q2.1. Is the value of K fixed throughout the training process? on what basis is the clustering performed? A2.1. The value of K is automatically determined to best suit each image. However, as K depends solely on the image, it remains fixed throughout the training process. The clustering criterion is: “groups pixels with similar intensities, effectively enhancing the segmentation of distinct anatomical regions in the medical image” (see caption of Fig. 2).

Q2.2. Is training effective when using images like those in Fig. 2? Could this not lead to the loss of boundary information instead? A2.2. Using augmented images like those in Fig. 2 helps the model achieve higher accuracy. The concern about loss of boundary information does not apply as we use both the original and augmented images during training. As stated in the Introduction (page 2): “We develop a dual-encoder mechanism with two separate encoders: one extracting features from the original image and the other from the augmented image.” Furthermore, we conducted an ablation study comparing the use of only original images versus using both original and augmented images (Sec. 3.3), showing that incorporating augmented images leads to better performance.


Reviewer #3

Q3.1. Larger model size due to the use of dual-encoders A3.1. Both of our encoders reuse the pretrained visual encoder from SAM. During training, we freeze the entire pretrained visual encoder and only train the lightweight prompts. Therefore, training does not incur a significant computational cost. As for inference, the visual encoder has 91 million parameters, which is a relatively small number and does not significantly impact inference time in practice.

Q3.2. Although the original prompt encoder from SAM is capable of generating both point and mask prompts, the authors use two custom-designed prompt encoders. Explain the reason? A3.2. Firstly, in SAM, prompts must be manually specified, whereas in our AM-SAM framework, prompts are generated automatically (see the title of Sec. 2.3). Secondly, we use both types of prompts because they serve different purposes. The mask prompt helps generate rough segmentation, while the point prompt facilitates fine-grained segmentation. Specifically, if only mask prompts are used, adjacent organs may be merged during detection. The point prompt helps separate such closely located organs.

Q3.3. Whether the mask decoder used in AM-SAM is the same as that of SAM. A3.3. We use the decoder from SSM-SAM, as shown in the SSM-SAM paper, which demonstrates superior performance compared to SAM’s decoder.

Q3.4. A comparative analysis of computational costs. A3.4. As illustrated in Figure 1, we use prompt fine-tuning for the entire visual encoder, making the training process fast and efficient. Specifically, the entire training takes only about 2 hours and 30 minutes (3 minutes per epoch for 50 epochs). The model has 90M more parameters than baseline SAM.


Reviewer #4

Q4.2. The reason for using of Dice Loss and IoU Loss instead of the combination of Cross-Entropy Loss and Dice Loss. A4.2. Since our task is segmentation, the use of IoU Loss follows the common norm in the field.

Q4.3. The article does not sufficiently compare other works that use the SAM2 for few-shot medical image segmentation. A4.3. Our method is built upon SAM, not SAM-2. Therefore, we focused our comparisons on models based on SAM rather than SAM2.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper proposes AM-SAM, a novel framework that adapts SAM for few-shot medical image segmentation by introducing a dual-encoder architecture and an automated dual-prompt mechanism. Reviewers acknowledged its technical contributions, particularly the combination of medical-specific augmentations and prompt-free segmentation, which addresses domain gaps and enhances practical applicability. The method demonstrated consistent improvements over baselines across multiple datasets, supported by a lightweight and efficient design. While some concerns were raised around clarity in methodology, justification for clustering choices, and limited comparative analysis, the authors sufficiently addressed key technical questions in the rebuttal. Given its novelty, practical relevance, and strong empirical performance, the paper warrants acceptance.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper proposes AM-SAM, a SAM-based framework for few-shot medical image segmentation, addressing domain gaps and prompt dependency. Despite addressing some concerns in the rebuttal, critical flaws remain unresolved, such as insufficient novelty, ambiguous technical descriptions, inadequate experimental validation, unaddressed computational costs, and structural weaknesses. These issues collectively undermined the work’s methodological rigor, reproducibility, and clinical relevance, failing to meet the conference’s standards for technical novelty and impact.



back to top