Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

The Segment Anything Model (SAM) has achieved outstanding performance in both natural and medical image segmentation with extensive research validation. When applied to ultrasound images, which involve low contrast, indistinct boundaries and complex shapes, large models still suffer from significant performance degradation and limited generalization ability. We explore these challenges from a new perspective with the help of the segmentation foundation model SAM. In this paper, we propose Nora, a noise-robust fine-tuning framework for SAM to address domain generalized ultrasound image segmentation. Specifically, we introduce a feature-adaptive perturbation module, which applies well-designed noise to the fine-tuned features. We stimulate the model to segment the correct regions even under severe interference, thereby improving its robustness. Moreover, to further optimize SAM with prompts, we present an instance-aware prompt generation module. We introduce a set of tokens linked to distinct instances and then design a token-based augmentation strategy to prevent overcoupling and encourage tokens to capture more diverse information. Our Nora achieves state-of-the-art performance across extensive cross-domain experiments with three ultrasound image segmentation tasks, fully demonstrating its effectiveness and generalizability. The code is available at https://github.com/wkklavis/Nora.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1075_paper.pdf

SharedIt Link: https://rdcu.be/eHwUF

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04971-1_45

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/wkklavis/Nora

Link to the Dataset(s)

BUSI : https://scholar.cu.edu.eg/?q=afahmy/pages/dataset DatasetB : https://helward.mmu.ac.uk/STAFF/M.Yap/dataset.php STU : https://github.com/xbhlk/STU-Hospital TN3K : https://github.com/haifangong/TRFE-Net-for-thyroid-nodule-segmentation DDTI : https://www.kaggle.com/datasets/dasmehdixtr/ddti-thyroid-ultrasound-images CAMUS : https://www.creatis.insa-lyon.fr/Challenge/camus/index.html HMC-QU : https://www.kaggle.com/datasets/aysendegerli/hmcqu-dataset

BibTex

@InProceedings{WeiZhi_NoiseRobust_MICCAI2025,
        author = { Wei, Zhikai AND Wu, Chao AND Du, Hanyu AND Yu, Rui AND Du, Bo AND Xu, Yongchao},
        title = { { Noise-Robust Tuning of SAM for Domain Generalized Ultrasound Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {476 -- 486}
}

Reviews

Review #1

Please describe the contribution of the paper

The manuscript presents an improved way of fine-tuning SAM for generalized ultrasound segmentation in a noise-robust manner. Two main modules have been proposed – feature adaptive perturbation and noise-robust prompt generation for improved segmentation. Comparison with state-of-the-art methods show improved Dice score on multiple datasets.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Extensive experimentation and comparison on multiple datasets and ablation study.
2. Use of adaptive perturbation for achieving noise-robust fine-tuning of SAM.
3. Illustration of good domain generalization and reduced overfitting.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The state-of-the-art comparison should also include contrastive learning methods (especially patch-based unsupervised contrastive loss) and intensity scaling approaches which is very close to the proposed method.
2. While the dice shows improvement, Hausdorff distance still needs improvement with respect to SOTA methods. Is it because the datasets consisting of large lesions still provide better dice while the boundary segmentation is poor. This issue needs to be discussed in detail.
3. Several implementation details missing – what is the loss function used? Computational complexity of the proposed architectural changes? Hardware and implementation details?
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

The organisation of the paper is fairly clear, however it could be improved in terms of more details and providing a stronger motivation for choosing the specific approach.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Lack of clarity in motivation, missing implementation information and explanation for lower HD with respect to SOTA methods (refer to weakness points above)
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The paper explore the method of using a feature-adaptive perturbation (FAP) model, which applies noise to the finetuned features for domain-generalized ultrasound, aiming to enhance the robustness of the model for image segmentation. A sigmoid function is applied to the features to generate adaptive noise weights, which adjust the magnitude of the perturbations to prevent overconfidence. The authors also present an instance-aware prompt generation module for finetuning the model, and a set of learnable tokens to interact with instances and serves as prompts to train the network. Furthermore, a token-enhancement algorithm for ultrasonic features is used. The authors reported better results on the segmentation performance on three different domains, comparing with different fine-tuning frameworks, augmentation methods and other ultrasound foundation models.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The major strength of the paper includes its analysis of the specific problem. The experiments are designed to target domain generalized performance of ultrasound segmentation, and the experiments not only includes other finetuning methods and traditional methods, but also other SAM-based methods.

The authors chose to use the datasets that SAMUS has trained on for the training dataset and the ones SAMUS excluded as the testing datasets, demonstrating awareness.

The paper compares the FAP module with other noise injection methods and achieves competitive performance.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The paper’s claim that low-value features can be less disturbed is not well-justified in this context. And will the feature-adaptive perturbation be layer dependent?

The author doesn’t compare the method with other medical image based fine-tuning methods other than DAPSAM, such as Trans-SAM [1], MSA [2], SAM-DA [3], Med-SA [4].

It would be nice to include the computational cost comparing to other baseline methods.

The Token-Enhancement Algorithm resembles prompt ensemble [5], and how is it tailored for ultrasound features?

High Dice scores accompanied by significant Hausdorff errors could indicate that the predictions fail to detect small regions [6].

[1] Wu, Yanlin, Zhihong Wang, Xiongfeng Yang, Hong Kang, Along He, and Tao Li. “Trans-SAM: Transfer Segment Anything Model to medical image segmentation with Parameter-Efficient Fine-Tuning.” Knowledge-Based Systems 310 (2025): 112909.

[2] Kolahi, Sina Ghorbani, Seyed Kamal Chaharsooghi, Toktam Khatibi, Afshin Bozorgpour, Reza Azad, Moein Heidari, Ilker Hacihaliloglu, and Dorit Merhof. “MSA $^ 2$ Net: Multi-scale Adaptive Attention-guided Network for Medical Image Segmentation.” arXiv preprint arXiv:2407.21640 (2024).

[3] Tejero, Javier Gamazo, Moritz Schmid, Pablo Márquez Neila, Martin S. Zinkernagel, Sebastian Wolf, and Raphael Sznitman. “SAM-DA: Decoder Adapter for Efficient Medical Domain Adaptation.” arXiv preprint arXiv:2501.06836 (2025).

[4] Wu, Junde, Ziyue Wang, Mingxuan Hong, Wei Ji, Huazhu Fu, Yanwu Xu, Min Xu, and Yueming Jin. “Medical sam adapter: Adapting segment anything model for medical image segmentation.” Medical image analysis (2025): 103547.

[5] Zhou, Kaiyang, Jingkang Yang, Chen Change Loy, and Ziwei Liu. “Learning to prompt for vision-language models.” International Journal of Computer Vision 130, no. 9 (2022): 2337-2348.

[6] Celaya, Adrian, Beatrice Riviere, and David Fuentes. “A generalized surface loss for reducing the hausdorff distance in medical imaging segmentation.” arXiv preprint arXiv:2302.03868 (2023).
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

The function section may include typos that you would want to revise.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

On top of DAPSAM, this paper includes feature adaptive perturbations to improve robusness for ultrasound. At the same time, noise and perturbations are added in both prompt generating and refining image embedding. Proposed methods are effiecitve in term of adapting SAM to be more ultrasound specific. However, the author didn’t provide enough comparison with other medical image based finetuning methods, as well as other general medical image domain methods such as MedSAM [1].

[1] Ma, Jun, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. “Segment anything in medical images.” Nature Communications 15, no. 1 (2024): 654.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

1.A novel feature perturbation mechanism based on noise is introduced for fine-tuning SAM. 2.An auto-prompt module is designed to generate fine-grained prompts. 3.The model achieves state-of-the-art (SOTA) performance on cross-domain ultrasonic datasets covering three distinct organs.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Method Novelty: Treats ultrasound domain shifts as “adversarial attacks” and applies noise perturbation to SAM’s feature space, a creative adaptation for medical images. 2.Auto-prompt design: The token-noise perturbation (TNP) mitigates token overfitting via similarity-guided noise, improving instance-aware prompts without heavy computation. 3.Comprehensive experimental design: The settings of the comparative and ablation experiments effectively demonstrate the efficacy of the method and the role of each module.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Regarding the generation of random noise, the paper seems to omit the handling of noise during the testing phase. Is the noise also randomly generated during testing 2.In the ablation experiment section, when the prompt part is removed, how is the experiment set up? The paper appears to be unclear about this
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Despite certain limitations in theoretical analysis and detailed description, this paper meets the standards of MICCAI in terms of methodological innovation, comprehensive experimental design, and clinical significance.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

All three reviewers give positive scores 444 with confidence 344. We are glad that reviewers recognize the novelty and effectiveness of our method. We address the main concerns as follows:

Comparison with other methods [R1, R2] Thanks for the suggestion. Per rebuttal policy, we cannot provide new experimental results. We will expand the comparison between our Nora, and other contrastive learning and intensity scaling approaches, emphasizing the novelty and efficiency of our method. Our baseline adopts the AdaptFormer, where only one adapter structure is inserted at the skip-connect position in each block for fine-tuning. Med-SA is an extension of this design, using two adapters per block, but the core idea remains the same. We will also compare with other general medical SAM-based methods.

Hausdorff Distance results [R1, R2] As noted by R2, “High Dice scores accompanied by significant Hausdorff errors could indicate that the predictions fail to detect small regions”, which is part reason for our low HD. On the other hand, the injected noise in our Nora may disrupt ambiguous regions, which can impair the model’s ability to accurately segment boundaries in certain cases, resulting in suboptimal distance scores. We will focus more on challenging regions for further improvement in future work.

Implementation details and computational cost [R1, R2] We adopt the same loss function as DAPSAM, which combines cross entropy loss and Dice loss with the balancing parameter λ set to 0.8. Our method has comparable FLOPs (66.68 G vs. 66.35 G) and a similar tuning/total parameter count (7.28 M/93.95 M vs. 7.19 M/93.87 M) to the baseline. We use a single NVIDIA RTX 3090 GPU with batch size set to 8.

Motivation [R1] Conventional generalization approaches mainly focus on style variation and regularization. However, their performance often falls short when applied to low-quality ultrasound images. Few methods are specifically tailored for DGUIS. We address this problem from a novel perspective by drawing an analogy between ultrasound generalization and adversarial defense. Specifically, we introduce well-designed noise to encourage the model to enhance its robustness, thereby improving its generalization capability. We will make this clear in the revision.

Feature perturbation [R2] We apologize for the misunderstanding on the noise perturbation weight. We generate the weight based on the relative magnitude of the current features using a sigmoid function—high values allow stronger perturbations, while low-value features are less disturbed. This adaptive weight enables us to apply noise perturbations within a controlled range, which helps preserve feature integrity while encouraging the model to identify more robust features. In our implementation, the FAP is applied to each layer. Making the perturbation layer-dependent is indeed a promising direction, which we will explore in the future.

Token-Enhancement prompt [R2] Our prompt generation module shares some similarities with suggested CoOp, as both introduce a set of learnable tokens. Differently, we adopt these tokens to interact with instances, refine image embeddings, and generate sufficient prompts for the decoder. This is a general-purpose design. Yet, our TNP module is tailored for ultrasound features to mitigate token-instance overcoupling.

Noise during testing phase [R3] The noise modules, FAP and TNP, are only applied during training and are not used during inference. By introducing well-designed noise, we fine-tune the SAM and facilitate the model to identify robust information, thereby improving generalization.

Ablation study setup [R3] In our ablation study, the TNP is built upon the Prompt module. When our instance-aware prompt generation part is removed, the model operates in the original prompt-free mode of SAM, where a default embedding feature is initialized and used as the prompt.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

Noise-Robust Tuning of SAM for Domain Generalized Ultrasound Image Segmentation

Author(s):