List of Papers Browse by Subject Areas Author List
Abstract
Interactive segmentation tools, such as SAM2, have shown strong performance in reducing annotation effort in natural images. However, unlike natural images, ultrasound images and videos often lack well-defined structure boundaries, which significantly degrade the performance of region-based point prompts in SAM models. To address these limitations, we introduce the Segment Anything Model 2 for UltraSound Annotation (SAMUSA). SAMUSA is based on SAM2 and introduces a new prompt strategy with boundary and temporal points, along with a novel boundary loss function, enabling the model to more efficiently segment structures with poorly defined boundaries, such as liver masses. We integrated SAMUSA as a 3D Slicer plugin, where it can be used for US videos and 3D US volumes segmentation. We present a prospective user study involving 6 participants (3 surgeons and 3 radiographers), which showed an average 34.1% annotation time reduction for image liver mass segmentation.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3015_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
N/A
Link to the Dataset(s)
Liver and Tumors 2D images
FASS
TG3K;TN3K
Ultrasound nerve segmentation dataset
MMOTU
STMUSNDA
TRUSTED
SegThy
Camus
Thyroid Cine
IUSLL (Private)
BibTex
@InProceedings{PodBap_SAMUSA_MICCAI2025,
author = { Podvin, Baptiste and Collins, Toby and Saibro, Güinther and Innocenzi, Chiara and Yang, Yuchuan and Milana, Flavio and Keeza, Yvonne and Ufitinema, Grace and Ujemurwego, Florien and Torzilli, Guido and Marescaux, Jacques and George, Daniel and Hostettler, Alexandre},
title = { { SAMUSA: Segment Anything Model 2 for UltraSound Annotation } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15970},
month = {September},
page = {513 -- 522}
}
Reviews
Review #1
- Please describe the contribution of the paper
- This paper proposed a tool for ultrasound image annotation by exploiting SAM2 model.
- They proposed boundary prompting strategy and temporal points.
- The boundary prompting strategy try to segmentation results to be aligned with given point prompts
- The temporal points is designed to solve the structural changes because of probe sweeping
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
The proposed methods can reduce the time for annotating ultrasound image.
-
The proposed method is simple and well written
-
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
-
The primary limitation of this paper lies in its lack of scientific novelty. The proposed method mainly introduces a modification to the prompting mechanism, which makes the contribution appear more engineering-driven than fundamentally innovative. In particular, there are several existing prompting mechanisms, such as scribble-based inputs [R1, R2], which are often more intuitive and time-efficient compared to boundary-pointing. Identifying precise boundaries can be more challenging and time-consuming. Therefore, a comparison with a variety of prompting methods is essential to justify the proposed approach.
-
From an evaluation perspective, if the authors aim to highlight the strength of their method for boundary-level segmentation, it would be beneficial to include metrics such as the Hausdorff distance, which better captures boundary accuracy.
-
Additionally, it would strengthen the paper to compare segmentation performance under a fixed annotation time constraint. Specifically, setting a predefined annotation time per image and asking annotators to work within that limit would allow for a fair comparison of efficiency and effectiveness across different prompting strategies, and could better demonstrate the practical utility of the proposed method.
[R1] PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts, MICCAI 2024
[R2] ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image, ECCV 2024
-
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This paper is lack of scientific novelty. Specifically, there is no new observations or problem definitions.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
As the other reviewers agreed, there is a lack of scientific contribution. It is more likely to be engineering how to use the SAM model. Also, in terms of boundary prompt, in the proposed framework, users first identify the boundary region, then select some points. In this case, users have to find the boundary region, which takes a lot of time. Therefore, scribble or bounding box is more time efficient.
In summary, since there is a lack of scientific contribution and novelty of boundary point prompting, I am leaning to reject this paper.
Review #2
- Please describe the contribution of the paper
The paper presents an extension of the Segment Anything Model 2 (SAM 2) designed for annotating ultrasound images and videos. The authors introduce boundary prompts and incorporate them into a boundary-aware loss function to address the challenge of ambiguous boundaries in ultrasound data. They also propose temporal prompts to specify the video segments where the target object is visible to reduce the inference time for the frame-wise prediction propagation in the video. They integrated their method as a plugin for 3D Slicer and demonstrate superior performance compared to SAM, SAM 2, and a fine-tuned ultrasound version of SAM 2 across 14+ ultrasound datasets, for both image and video segmentation tasks.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper has the following strengths:
1) Particularly strong evaluation:
The authors conduct extensive experiments to quantitatively compare their approach to prior work, both in a real setting with 6 medical annotators (3 surgeons and 3 radiographers) as well as in a setting with simulated clicks. They utilize 14 ultrasound datasets, which is not seen anywhere else in literature in terms of such a large scale, even in dedicated benchmark papers, usually covering 4-5 datasets at most [1, 2, 3]. The authors also investigate both the image and video domains and conduct statistical testing to show that their results are significant.
2) Plug-and-play approach:
The authors integrate their approach as a 3D Slicer plugin, making it easily accessible to the community. This is particularly important for the ultrasound domain as it is often underrepresented compared to, e.g., radiology, where there are 100+ public annotated CT and MRI datasets [3]. An annotation tool for ultrasound that works well for any structure, even unseen ones as lesions in this paper, is an important step towards generating more high-quality annotated ultrasound datasets.
3) Straightforward and generalizable idea:
The author’s idea is straightforward, as it simply uses SAM 2 and fine-tunes it with a custom boundary-aware loss on ultrasound data. This extension of SAM 2 could easily be transferred to other domains with noisy boundaries, e.g., PET lesion segmentation, although it is originally designed by the authors for ultrasound data.
[1] Thomas, Cory, et al. “BUS‐Set: A benchmark for quantitative evaluation of breast ultrasound segmentation networks with public datasets.” Medical Physics 2023
[2] Kuş, Zeki, and Musa Aydin. “MedSegBench: A comprehensive benchmark for medical image segmentation in diverse data modalities.” Nature Scientific Data 2024
[3] Cheng, Junlong, et al. “Interactive medical image segmentation: A benchmark dataset and baseline.” arXiv preprint 2024
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Despite the brilliant evaluation and relevance to the ultrasound field, the paper has some major weaknesses in its methodology:
1) Questionable technical novelty of boundary prompts:
The authors state that “SAMUSA introduces two novel prompt mechanisms: boundary and temporal points.” However, the boundary prompts are technically not novel concepts as they are widely used for interactive segmentation in both natural [4, 5, 6, 7, 8] and medical models [9, 10, 11]. The boundary-aware loss function is also quite similar to the work of Roth et al. [12], however, the authors do acknowledge this in their related work. The vast literature [4-12] on boundary prompts severely undermines the authors’ claim on its novelty in their work.
2) Missing implementation details:
There are multiple missing implementation details in the paper that hinder both the reproducibility as well the understanding of the reasoning behind the design decisions that the authors have made. I would like the authors to comment on the following:
-
How was the ultrasound SAM 2 baseline fine-tuned? Was it also trained with simulated boundary prompts as SAMUSA or with another simulation method?
-
Why is lambda set in the loss function to 2? Is there any particular reason for this value and is it sensitive to small changes?
-
What kind of questionnaire was given to the users, e.g., SUS, NASA-TLX, or a custom format? It seems there are many values listed in the paper but specifying what type of questionnaire was given to the participants would make the numbers more interpretable.
3) Inconsistent simulation of clicks during evaluation:
During evaluation, the authors utilize center-click prompts for all models they compare to, but provide boundary prompts for their own model. This does not seem to be a fair comparison, as the click simulation strategy is inconsistent among the compared models. This would essentially be the same as comparing the results of the models when given to different annotators (e.g., an annotator who tends to click in the center and one who tends to click on the boundaries). In that case, it would be difficult (if not impossible) to know if the performance difference stems from the models or from the annotators themselves.
[4] Maninis, Kevis-Kokitsi, et al. “Deep extreme cut: From extreme points to object segmentation.” CVPR 2018
[5] Majumder, Soumajit, et al. “Two-in-One Refinement for Interactive Segmentation.” BMVC 2020.
[6] Le, Hoang, et al. “Interactive boundary prediction for object selection.” ECCV 2018
[7] Lin, Zheng, et al. “Multi-mode interactive image segmentation.” ACM Multimedia 2022
[8] Dupont, Camille, Yanis Ouakrim, and Quoc Cuong Pham. “Ucp-net: Unstructured contour points for instance segmentation.” SMC 2021
[9] Luo, Xiangde, et al. “MIDeepSeg: Minimally interactive segmentation of unseen objects from medical images using deep learning.” Medical image analysis 2021
[10] Khan, Shadab, et al. “Extreme points derived confidence map as a cue for class-agnostic interactive segmentation using deep neural network.” MICCAI 2019
[11] Raju, Ashwin, et al. “User-guided domain adaptation for rapid annotation from user interactions: a study on pathological liver segmentation.” MICCAI 2020
[12] Roth, Holger R., et al. “Going to extremes: weakly supervised medical image segmentation.” Machine Learning and Knowledge Extraction 2021
-
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
I have a few minor comments that would improve the paper’s structure and presentation in addition to the comments listed in the “Major Weaknesses” section:
Minor comments:
-
The claim on page 1, “SAM2 has not been considered for US data in the literature,” is not entirely true. There are multiple adaptations of SAM2 tested on ultrasound [13, 14, 15], so it is not something the community is unaware of. However, all such works are still pre-prints, so I do not use this as a point of criticism; I simply want to draw the authors’ attention to related works. A more concise formulation in the introduction would be “SAM2 has not been considered [explicitly] for US data in the literature,” as no works are using SAM2 [only] on ultrasound.
-
The text on Fig. 2 and Fig. 3 is too small and hardly readable. I would increase the font size.
Missing space on page 5: warm-up period.We -> warm-up period. We
[13] Sengupta, Sourya, Satrajit Chakrabarty, and Ravi Soni. “Is SAM 2 Better than SAM in Medical Image Segmentation?.” arXiv preprint arXiv:2408.04212 (2024).
[14] Yan, Zhiling, et al. “Biomedical sam 2: Segment anything in biomedical images and videos.” arXiv preprint arXiv:2408.03286 (2024).
[15] Dong, Haoyu, et al. “Segment anything model 2: an application to 2d and 3d medical images.” arXiv preprint arXiv:2408.00756 (2024).
-
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
While the paper has a limited technical novelty as boundary prompts and losses are already well-established in the MICCAI community, and has some missing implementation details, the paper does present an extensively validated solution to ultrasound image and video annotation and provides a pre-trained model integrated into the open-source 3D Slicer framework. Despite its limitations, this paper present a simple and pragmatic solution for the ultrasound domain that the MICCAI community would benefit from. Hence, I would vote for the acceptance of the paper on the condition that the authors clarify the points I raised on the missing implementation details, as this is crucial for the paper’s reproducibility.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
I believe the authors have adressed all of my concerns as well as the concerns of the other reviewers. The novelty aspect that was a major drawback of the work is compensated by the extensive experiments and the plug-and-play approach that the authors present.
Review #3
- Please describe the contribution of the paper
The proposed approach introduces a human-in-the-loop tool for annotating ultrasound (US) images based on SAM2, a foundational segmentation model. Instead of using the foreground/background point prompts of vanilla SAM2, the paper proposes boundary point prompts, which helps the model to better distinguish the structure of interest in US images. In addition, temporal prompting is introduced to define when structures are visible in a video, enabling faster and more accurate inference. The model is fine-tuned and evaluated on a diverse set of datasets including various organs. A user study demonstrates that the proposed method significantly outperforms other SAM-based architectures in terms of both efficiency and segmentation quality.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper presents a novel and well-motivated approach for annotating ultrasound images. The method is generally well-explained and the claims are very well justified.
- Both user studies and simulation experiments demonstrate that the proposed approach significantly outperforms the baseline.
- The integration of two key innovations (boundary prompts and temporal prompts) into the SAM2 framework aligns well with the unique challenges of ultrasound data.
- The experiments are extensively done on various datasets containing different organs.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The paper only compares SAMUSA to SAM and its variants (SAM2, SAM2-US, SAM2-USTP). Other US segmentation methods (maybe non-interactive ones) are not included, even though they might be useful for benchmarking.
- While the paper explains how temporal prompts are used, there are no demo videos or clear visual examples that show exactly how users define visibility windows in practice. Figures or videos would help here.
- Table 1 shows some datasets used only for testing (like IUSLL), others for training, and some used for both, but the reasoning behind these choices isn’t explained in the text.
- The the related work section, the paper mostly references SAM, MedSAM, SAM2, and their extensions. It does not mention many other ultrasound segmentation approaches that could provide useful background or comparison.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- The model was not trained on images from STMUS NDA dataset or any muscle images from other datasets. How is the model able to segment muscles in STMUS NDA dataset? Why is this dataset not used in training to improve the performance on test data?
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper introduces a novel approach for ultrasound image annotation by extending SAM2 with boundary and temporal prompts. The proposed method is well-motivated and demonstrates strong performance through both simulation and user studies, showing clear advantages over existing SAM-based baselines. The experimental setup is comprehensive, covering a diverse range of datasets and clinical structures. However, I assign a Weak Accept rating due to several important concerns that should be addressed in the rebuttal. Specifically, the paper does not mention any non-SAM-based UR segmentation methods as background or comparison. The reason behind dataset usage (training and testing) is not clearly explained, and the mechanism for temporal prompting would benefit from additional visual or video examples. Furthermore, there is no information regarding the release of source code or the 3D Slicer plugin, which raises concerns about reproducibility. If the authors provide satisfactory clarifications during the rebuttal, the paper can be recommended for acceptance.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We sincerely thank the reviewers and address the main concerns below. R1 (Novelty): Our method combines several overlooked innovations into a practical, high-performing solution. We introduce temporal prompts to address structural variability in cross-sectional ultrasound, and our plug-and-play design [R2] enables iterative segmentation refinement and easy adaptation to new tasks and modalities. By leveraging both image and video training data—key given the scarcity of ultrasound video datasets—we overcome significant data limitations. Coupled with the first scientific user validation in the US video domain (as noted by reviewers), we believe our submission demonstrates clear and sufficient novelty. R1 (Value of boundary prompts):US structure borders pose challenges for clinicians and models, especially with indistinct boundaries. Unlike MRI or CT, US borders require different treatment [8]. At low-contrast boundaries—caused by e.g. acoustic shadowing or steep insonation angles—clinicians rely on prior anatomical knowledge and/or context from other frames (“completed boundaries” [8]). The reviewer’s claim that scribble prompts, for US annotation, are more intuitive or efficient lacks support. While including scribbles would have complemented evaluation, our user study showed a user preference and quantitative improvement for directly annotating boundary points vs region points, especially in handling low-contrast regions, not supported by scribbles.. We believe our contributions advance the field by promoting user-centered design, with boundary and temporal prompts as key tools for US videos. Additionally, a key use case is intraoperative annotation on US consoles, which mainly have trackball mice and operators wearing sterile gloves, for precise point selection and hygiene purposes. However, trackballs may make drawing fluid lines awkward and unnatural. We appreciate the reviewer’s suggestion; future work could evaluate scribble annotations across different UI hardware. R1 (evaluation):We respectfully disagree that using a fixed annotation time per image is a better validation method. Fixed data quantity evaluation is routinely used in high-quality studies [1,2,3], offering a fair and reproducible approach, and without requiring defining an arbitrary time threshold a priori. Our validation was strongly supported by R2 & R3. Dataset Choices (R3): We used only publicly available datasets for training for reproducibility. Some datasets (e.g., IUS-L) were used in both image and video settings for consistent evaluation across simulated and user studies, while others were used only for testing to support zero-shot evaluation. Method comparison (R3):Our goal was a general, plug-and-play boundary and temporal prompting method compatible with a state-of-the-art model (SAM2). Many US-specific models [4,5] are narrow in scope and lack generalizability. Prompt Clarification (R2): Our boundary points use distinct embeddings, resulting in different model behavior. Since SAM2 isn’t designed for boundary points, we used its original region prompts for the baseline. The boundary loss weight (λ = 2), tuned on the validation set, balanced segmentation accuracy and boundary adherence.. Temporal prompts clarification (R3): They are click-based, users place points anywhere on the relevant frames; only frame positions matter. These prompts guide which frames the network processes during mask propagation.
Other: We plan to fix typos, add missing references (space permitting) and release the code and Slicer plugin. References: 1Diaz-Pinto et al., Monai Label, MedIA, 2024 2Pavoni et al., TagLab, JFR, 2022 3Wong et al., ScribblePrompt, ECCV, 2024 4Li et al., Placenta Segmentation, ASMUS, 2024 5Girum et al., Fast Interactive Segmentation, IJCAIR, 2020 6Ravishankar et al., SonoSAM, ASMUS, 2023 7Guo et al., ClickSAM, SPIE, 2024 8Gonzalez Duque et al., Ultrasound segmentation analysis via distinct and completed anatomical borders, IPCAI-IJCARS 2024
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Reject
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
After considering the reviewers’ evaluations and the authors’ rebuttal, I regret to recommend rejection of this paper.
While Reviewer #2 and #3 found the paper acceptable due to its strong experimental validation and plug-and-play usability, both Reviewer #1 and Reviewer #1 raised fundamental concerns regarding the scientific novelty and contribution of the proposed method. The core of the method is viewed primarily as an engineering application of SAM (Segment Anything Model), with limited methodological innovation. Specifically, the proposed boundary point prompting scheme is perceived as less efficient than existing scribble or bounding box inputs, requiring users to manually identify boundary regions, which diminishes its practical appeal.