Abstract

Automatic ultrasound nerve localization algorithm is crucial in nerve block procedures and neuropathy detection. However, the performance of existing approaches are typically constrained by the limited scale of ultrasound image datasets. While adapting from large scale models such as Segment Anything Model (SAM) has demonstrated remarkable performance on medical images, its effectiveness heavily relies on extensive datasets and substantial computational resources. This presents significant challenges for adapting SAM to ultrasound image segmentation. To address these challenges, we propose a novel parameter- and data-efficient adaptation method called Hierarchical Adapter. Specifically, the Hierarchical Adapter can flexibly adjust the number of fine-tuning parameters to optimize the exploitation of data and computational resources. In addition, we observe the depth-dependent difficulty for adapting different Transformer blocks of SAM. Therefore, we insert Hierarchical Adapters with varying sizes into transformer layers at different depths of the SAM encoder, optimizing the distribution of trainable parameters. This design significantly improves the parameter-efficiency during adaptation while simultaneously enhancing segmentation performance. Compared to state-of-the-art methods, our model reduces training parameter requirements by more than half while still achieving an approximately 1.5% improvement in Dice score on two ultrasound nerve datasets.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3104_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{PenZih_HASAM_MICCAI2025,
        author = { Peng, Zihao and Kang, Susu and Huang, Xuping and Xiang, Xucheng and He, Gengyu and Liu, Tianzhu and Mei, Wei and Tan, Shan},
        title = { { HA-SAM: Hierarchically Adapting SAM for Nerve Segmentation in Ultrasound Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {327 -- 336}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors proposed HA-SAM, an adaptor method to finetune SAM for nerve segmentation in ultrasound images. The proposed adapter method is in the family of PEFT for fine-tuning large model with task-specific data using few parameters. The authors provide an interesting observation that the bottleneck dimension of learnable modules in each layer don’t have to be same. The empirical studies on three different ultrasound dataset back up the claim.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Recent study on fine-tuning LLM with adapter methods suggests while larger bottleneck sizes can enhance capacity, excessively large sizes may introduce overfitting or diminish efficiency, particularly with limited resources [1]. HA-SAM could be an alternative to balance between adapter capacity and performance.

    [1] Hu, Zhiqiang, Lei Wang, Yihuai Lan, Wanyu Xu, Ee-Peng Lim, Lidong Bing, Xing Xu, Soujanya Poria, and Roy Ka-Wei Lee. “Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models.” arXiv preprint arXiv:2304.01933 (2023).

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The authors metioned PEFT methods LoRA and visual prompt tuning, but don’t comapring them with the proposed method.

    Missing SOTA comparison: As the authors suggested, most medical SAMs are not trained with large amount of US data. However, recent proposed Ultrasam [2] also trained on multiple ultrasound dataset including the Kaggle BP and BUSI dataset, and perform better than MedSAM.

    [2] Meyer, Adrien, Aditya Murali, Didier Mutter, and Nicolas Padoy. “UltraSam: A Foundation Model for Ultrasound using Large Open-Access Segmentation Datasets.” arXiv preprint arXiv:2411.16222 (2024).

    Several questions regarding the results: High Dice scores accompanied by significant Hausdorff errors could indicate that the predictions fail to detect small regions [3].

    [3] Celaya, Adrian, Beatrice Riviere, and David Fuentes. “A generalized surface loss for reducing the hausdorff distance in medical imaging segmentation.” arXiv preprint arXiv:2302.03868 (2023).

    There are large discrepancy between reported result by the authors and from the baseline methods on BUSI dataset. For example, SAMUS [4] reported a Dice score of 84.54 on BUSI dataset using ViT-B comparing to 77.81 reported by the authors using ViT-H. Please address this issue.

    [4] Lin, Xian, Yangyang Xiang, Li Yu, and Zengqiang Yan. “Beyond adapting SAM: Towards end-to-end ultrasound image segmentation via auto prompting.” In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 24-34. Cham: Springer Nature Switzerland, 2024.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My main concern is regarding the benchmark results, where there are significant drop of performance from other baseline methods reported from their publications on the same dataset.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    I would like to thank the author’s comments. However, the discrepancy between the results cannot be ignored. Such issue can be contributed by the training streagy; but without look more into the training/validation scheme conducted by the authors, I cannot be 100% sure.



Review #2

  • Please describe the contribution of the paper

    This paper presents an efficient approach for adapting the foundation model SAM or its medical variants to specific domains, particularly those with limited training data available during the foundation model’s original training. An example of such a domain is ultrasound imaging. The proposed method introduces a hierarchical adapter architecture, in which smaller adapters are assigned to shallower layers and larger adapters to deeper layers. The approach is evaluated on the specific task of nerve segmentation in ultrasound images.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Studying the adaptation of foundation models across different scenarios is crucial. The present work focuses on adapting such models to domains where only limited training data were available during the foundation model’s original training.

    To assess the statistical significance of performance differences, a two-sample t-test is employed to compute the p-value. Although statistical testing is essential for robust performance comparison, it is often overlooked in research literature. The inclusion of such analysis in this work is a commendable practice.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The proposed approach is simple yet effective. Its main contribution appears to be the depth-dependent design of the adapter module, referred to as the Hierarchical Adapter in the paper. If this is indeed the central contribution, it should be made more explicit.

    It also appears that no comparison is provided against a depth-independent adapter design. Such a comparison is important to clearly demonstrate the advantages of the proposed depth-dependent approach.

    Regarding the claim”Depth-Aware Adaptation: We identified the depth-dependent difficulty in adapting different Transformer blocks of SAM, revealing that deeper layers require more parameters for effective fine-tuning”, there is little to no supporting evidence or analysis in the text to substantiate it. As such, the paper essentially presents one main contribution: the Hierarchical Adapter SAM.

    In terms of dataset handling, each dataset is randomly split into training, validation, and test sets using a 7:1.5:1.5 ratio. However, no k-fold cross-validation was performed, which would have provided a more reliable estimate of performance.

    While the use of statistical testing (via a two-sample t-test) for performance comparison is a positive aspect of the work, such testing was not applied in the ablation study. This is particularly important, as several of the reported Dice scores in the ablation results are very close, and statistical significance testing would have clarified whether these differences are meaningful.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    The proposed approach is quite general and not inherently tied to the task of ultrasound nerve localization. Accordingly, the writing, and especially the title, could be framed in a more general manner. The work is more “with application to nerve segmentation in ultrasound images” rather than “for nerve segmentation in ultrasound images.”

    “Our results show that the network achieves an accuracy of 99.97%, meeting doctors’ requirements for click-based prompts in surgical settings”. It should be explicitly stated that, in the event of segmentation failures, no manual correction or other post-processing was applied during the experimental evaluation.

    The results presented in Section 4.4 appear to be a mixture of a conventional ablation study and other investigated strategies. The section title and overall wording could be revised slightly to better reflect this hybrid nature.

    Lastly, a number of the references are to preprints on arXiv. Many of these works, if not all, have since been published in peer-reviewed journals or conference proceedings. Wherever possible, these formal publication versions should be cited instead.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Studying the adaptation of foundation models across diverse scenarios is an important research direction. The current paper proposes an approach aimed at this goal. The experimental results provide initial evidence supporting its effectiveness.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have addressed the major issues that were raised. While some limitations remain, the work offers valuable contributions to the community.



Review #3

  • Please describe the contribution of the paper

    The paper proposes HA-SAM, an adaptation of the Segment anything model (SAM) for ultrasound nerve segmentation. HA-SAM modifies the encoder transformer block with two adapters, one in series with the multi-head attention, and the second within the residual connection of the feed-forward layer. The adapters are hierarchical (hence the name Hierarchical Adapter (HA)) with varying size, increasing with model depth achieving a depth-aware adaptation.The authors validate HA-SAM on two ultrasound nerve datasets and one breast ultrasound dataset and results suggest that HA-SAM outperforms state-of-the-art segmentation methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Simplicity: The HA adapter is very simply and elegant in design consisting of a simple down projection layer, up projection and residual connection, with only a single hyperparameter, the bottleneck coefficient r.
    • Parameter efficient: The hierarchical design allows for adapting all the layers in the SAM’s encoder with very few trainable parameters. HA-SAM has 50% fewer parameters compared to SAM-SA.
    • Extendability: HA-SAM can be applied to other datasets as illustrated by its better performance on the breast ultrasound, BUSI, dataset.
    • Clarity and organization: The paper is very well written, relevant literature highlighted, and easy to read.
    • Novelty: The proposed HA adapter is novel in medical image segmentation and achieves state-of-the-art performance on two public datasets, KaggleBP (a nerve segmetnation dataset) and BUSI.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Worse performance on the boundaries: While HA-SAM outperforms other methods in terms of Dice and IoU, it achieves worse HD95 across all the datasets and the paper does not explain why this is the case.
    • Requires prompting: Just like any SAM method, HA-SAM requires prompting.
    • Limited experiments to validate data efficiency: The authors claim that HA-SAM but there is no results to support this claim. My expectation was that HA-SAM would be trained with varying data sizes to demonstrate that it achieves similar performance even with smaller dataset sizes.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Major comments:

    • Unclear dimensionality of the bottleneck features: Sect. 3.2 mentions that the dimensionality of W_down is (d x r) and Eqn1 defines the bottleneck coefficient of the ith layer to be ri = r_i = d x r x (i/6). Does this mean that the dimensionality of the corresponding W_down is d x (d x r x (i/6))?
    • Definition of shallow and deep layers: The authors consider the first six layers to be shallow and the remaining six layer to be deep. However, it is unclear how they arrived at this design choice. Also, the sixth and seventh layers have adapters of the same size, i.e. r6 = r7 but it is not clear in the paper why this is the case.
    • Inconsistent mathematical formulation of the HA encoder: Equation 3 is inconsistent with Fig. 1 as it only suggests that all the HA blocks connected in series. It does not account for the Layer Norm, Multi headed attention, the residual connection right after the bottleneck HA module, the parallel layer norm and feed forward network, and the final residual connection at the end of each transformer block.
    • Rationale for fine-tuning the encoder instead of the decoder: The mask decoder was trained to generate masks based on the image features from the encoder. When the encoder if finetuned, the feature distribution changes. How then can the mask decoder generate the desired masks for the new feature distribution?

    Minor comments

    • Section 4.1: Define IoU and HD95. Also Dice should begin with upper case.
    • Strong claims about the HA adapter (Sect. 4.2): “…efficiently integrates ultrasound-specific information”. From my understanding, it is more like dataset specific information, because I didn’t find anything ultrasound specific about HA-SAM. My understanding is that this approach would work on any other dataset.
    • Crowded results table 1: I appreciate the authors for performing the extensive statistical analysis. However, the many asterisks made it a bit harder for me to decipher. For instance what is the statistical significance referring to? Is it between each method and HA-SAM, or each method and all other methods? For instance, SAMed achieves a higher IoU compared to HA-SAM on KaggleBP, but, HA-SAM is bolded instead of SAMed. Same thing for IoU for SAMUS on BUSI dataset. My suggestion is to include the asterisk only on the HA-SAM results if HA-SAM is statistically significantly better than all the other methods, else, no asterisk.
    • Table 1: Highlight that the units for IoU (%) and HD95.
    • Also what are the units for the total parameters?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (6) Strong Accept — must be accepted due to excellence

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is very well written, the proposed method is novel, elegant, and clearly explained. Ablation study is provided and evaluation is done on multiple datasets.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I stand with my previous comments. The paper introduces a novel parameter-efficient module, HA-SAM, for adapting SAM. The module can easily be extended to other datasets. The paper is well-written. The authors are aware of the limitations of their approach (and should include these in the revised manuscript).

    However, I implore the authors to address the inconsistencies in the mathematical formulation raised in the weakness section.




Author Feedback

We sincerely thank all reviewers for their thoughtful comments and valuable suggestions.

To Reviewer 1:

  1. On comparison with depth-independent adapter. A comparison with a depth-independent adapter is indeed important. In our preliminary internal experiments, the depth-independent variant consistently performed ~1% lower across key metrics.
  2. On the adaptation claim. An analysis using feature distribution was presented in Section 4.3 to provide initial support for this claim, consistent with the focus of our ultrasound nerve segmentation task. Though preliminary, the result reflects our design motivation. We agree that further validation is needed and will revise the contribution statement in the revised version accordingly if allowed in the future.
  3. On k-fold validation and significance testing. We acknowledge the value of a k-fold cross-validation. Due to the rebuttal policy and time constraints, we will leave this validation to the revised version if allowed in the future. We have conducted two-sample t-tests in the ablation study, which confirmed that the improvements from the full model are statistically significant.

To Reviewer 2:

  1. On PEFT methods. SAMed, which we included in our comparison, employs LoRA for adapting SAM. VPT methods, while related, have not yet been adapted for prompt-based segmentation like SAM. Existing work mainly varies prompt type or quantity rather than exploring tuning mechanisms. Due to space constraints, we did not elaborate on this point in the main paper.
  2. On the SOTA comparison. We are aware of Ultrasam, which explores SAM-based segmentation on ultrasound data. Since it employs object detection metrics such as mAP instead of segmentation metrics like Dice or IoU, a direct comparison is not feasible within our current setup. Due to time constraints during the rebuttal phase, we only conducted a BUSI evaluation, which indicated that UltraSAM underperforms compared to MedSAM and our proposed method. We acknowledge its relevance and will consider it in future benchmarking.
  3. On HD95. While HA-SAM achieves competitive Dice scores, it does not always produce the best HD95, which may reflect difficulty detecting small or fragmented structures. Still, our HD95 remains within a competitive range and demonstrates robust performance across datasets. It appears that HA-SAM is capable of balancing region accuracy with reasonable boundary precision. We will explore this further in future work.
  4. On the baseline results. We thank the reviewer for raising this important concern. As most prior studies adopt different data partitions and often do not disclose their exact configurations, direct comparison becomes challenging. In our work, we applied a consistent 7:1.5:1.5 train/validation/test split across all methods. Additionally, we used the ViT-B backbone (rather than ViT-H) to align with the SAMUS setting. To ensure fairness, we retrained all models from scratch under a unified data split, architecture, and training protocol, instead of relying on externally released checkpoints. This approach enables consistent and reproducible comparisons. We did observe some discrepancies between our reproduced baseline results and those reported in the original papers, which we attribute to variations in data splits and implementation details such as batch size and hyperparameter choices.

To Reviewer 3:

  1. On HD95. Please refer to our response to Reviewer 2, Point 3.
  2. On prompting. We acknowledge that prompting is needed, as with all SAM-based methods. Reducing prompt dependency is a direction we plan to explore.
  3. On data efficiency. We did not trained HA-SAM with varying data sizes. However, our datasets range from 650 to 2500 samples, and 650 is already low for large pre-trained models. HA-SAM performs robustly across these conditions. We agree this should be further evaluated and will add data-size ablations in future work.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper presents a method to adapt SAM for nerve segmentation in ultrasound images. However, the motivation of depth-aware adaptation is not sufficiently supported in the analysis. Furthermore, key concerns regarding the experimental results remain unresolved, including the degradation in Hausdorff Distance metrics and the discrepancy between the reported results and the results published in the original papers of comparison methods. Considering these limitations, the recommendation is rejection in its current format.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top