List of Papers Browse by Subject Areas Author List
Abstract
Although the SAM2 foundational segmentation model excels in natural images, its direct adaptation to 3D medical imaging (e.g., CT/MR) remains underexplored, particularly for zero-shot generalization. We identify two critical barriers when treating medical volumes as pseudo-video sequences: (1) the non-convexity of anatomical structures leading to slice-wise mask discontinuities; (2) difficulty in effectively generalizing the dependencies between long-term and short-term memory. To address these problems, we propose a stochastic connected component propagation strategy for handling mask discontinuities during training, coupled with a dynamic memory window search mechanism during inference. Extensive experiments demonstrate the effectiveness of our method, achieving a 16\% Dice score improvement over conventional fine-tuning in the unseen classes of TotalSegmentator dataset. Furthermore, our approach generalizes well across modalities (CT/MR) and lesion types, and it performs comparably to or outperforms previous methods on the ULS23 and CHAOS benchmarks.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2854_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
N/A
Link to the Dataset(s)
N/A
BibTex
@InProceedings{WanYuj_SAM2ProMem_MICCAI2025,
author = { Wang, Yujie and Huang, JunTao and Liang, Dazhu and Liao, Fangzhou and Chen, Jie and Chen, Boan},
title = { { SAM2-ProMem: Enhancing Zero-Shot 3D Segmentation with Stochastic Propagation and Memory Search } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15965},
month = {September},
page = {597 -- 606}
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper proposes SAM2-ProMem, an adaptation of the SAM2 foundational model for zero-shot 3D medical image segmentation. It addresses slice-wise mask discontinuities in medical volumes with a novel stochastic connected component propagation strategy and introduces a dynamic memory window search mechanism to effectively handle long-term and short-term memory dependencies, significantly improving zero-shot segmentation performance.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
a. Introduces a stochastic propagation strategy to manage slice-wise discontinuities effectively during training, enhancing zero-shot generalization capabilities. b. Implements a dynamic memory search mechanism that optimizes the integration of historical contexts, crucial for capturing detailed anatomical structures in medical segmentation. c. Demonstrates robust clinical feasibility with a significant Dice improvement (16%) over traditional fine-tuning methods on unseen classes in the TotalSegmentator dataset and strong generalization across CT and MR modalities and lesion types. d. Provides a generalizable zero-shot framework particularly beneficial for rare anatomical structures, alleviating the dependency on extensive annotated medical datasets.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Computational Overhead: The proposed memory window search mechanism, although effective, potentially introduces increased computational complexity during inference, possibly limiting real-time clinical applications. Dependency on Central Slice Prompting: The method’s effectiveness relies heavily on selecting a central slice as the initial prompt, which may not always be feasible in practical medical imaging scenarios. Limited Comparison with Larger SAM2 Models: Due to resource constraints, comparisons are restricted to smaller SAM2 model variants, leaving uncertainty regarding performance when scaled to larger, more typical SAM2 configurations.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Please see the strength and weakness sections.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #2
- Please describe the contribution of the paper
This paper proposes SAM2-ProMem, an approach to adapt the video segmentation model SAM2 for zero-shot 3D medical image segmentation. The authors reinterpret one anatomical axis as a pseudo-temporal dimension, allowing slice-wise processing. Two key innovations are introduced:
Stochastic Propagation: During training, discontinuities between adjacent slices are detected via connected component analysis, and only one spatially connected region is stochastically retained.
Memory Search: During inference, the model dynamically searches across memory window sizes to select the best-performing predictions for each slice.
Extensive experiments show strong improvements in zero-shot generalization.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
The paper addresses a meaningful and underexplored challenge: zero-shot 3D medical segmentation using video-based foundation models.
-
The proposed methods are simple, lightweight, and effective, as demonstrated by solid experimental results.
-
The authors provide comprehensive ablations, including both in-domain and zero-shot evaluations, across different organs and modalities.
-
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
-
Discontinuity definition is sensitive to starting slice. The concept of discontinuity relies on the starting slice of the pseudo-temporal sequence, meaning the same anatomical structure might be treated as continuous or discontinuous depending on where the sequence begins. This introduces supervision inconsistency and potential bias during training.
-
Stochastic retention of components lacks robustness analysis. Randomly retaining one connected component in the presence of discontinuities could discard important semantic regions, especially for small or multi-object structures. No analysis is provided on the stability or failure cases of this strategy.
-
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
see strength
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
This paper presents a methodology for adapting Segment Anything Model v2 (SAM2) to segment volumetric images, specifically CT and MRI scans. The proposed method employs a variable memory size and generates multiple segmentation predictions, selecting the highest-confidence result. To address the non-convex nature of anatomical structures, the training strategy is modified using a stochastic connected component propagation approach. Evaluation demonstrates improved zero-shot segmentation performance.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper is well-written and logically organized, ensuring ease of comprehension. The core difficulties in applying SAM2 to volumetric segmentation are clearly and comprehensively described. The proposed method of employing a variable number of predictions is soundly conceived, and its impact on segmentation quality is convincingly demonstrated by the evidence. The well-structured ablation studies provide a detailed and insightful analysis of each component’s impact. The demonstrated improvement in zero-shot performance represents a valuable step towards developing reliable systems for clinical applications.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Generating multiple prediction variations could introduce a considerable computational cost compared to a single prediction. The paper does not include an analysis that details the magnitude of this potential performance overhead. The inference process does not clearly specify if a complete initial mask prompt is required, or if alternative prompt types are also supported. Understanding this is crucial for interpreting the evaluation methodology.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This paper presents an effective adaptation strategy to apply the Segment Anything Model v2 (SAM2), a high-capacity vision model, to the domain of volumetric medical image segmentation. Empirical evidence corroborates the efficacy of the proposed methodology, with a particularly noteworthy enhancement in zero-shot segmentation accuracy. This improvement suggests a greater capacity to segment novel anatomical structures or modalities without task-specific fine-tuning, a crucial advantage for practical clinical deployment.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We thank all reviewers for their insightful comments. Below, we address each question and concern in turn.
Response to Reviewer 1 Q1: Sensitivity to the choice of starting slice As shown in Table 1, within the 0.3–0.7 interval the performance gap between the best and worst cases is only 0.0074. In other words, selecting an initial prompt at approximately the 40% position already yields near-optimal results. In practice, since inference outputs are typically reviewed by a human operator, one can simply adjust the starting slice—if it falls outside this subinterval—and perform a targeted second-pass inference in that region to further refine the predictions. Q2: Limited Comparison with Larger SAM2 Models We agree that a more extensive evaluation against larger SAM2 variants is important. Due to current time and resource constraints, we will explore a more extensive comparison with larger SAM2 variants in future work.
Response to Reviewer 2 Q1: Supervision consistency when varying the starting slice Changing the starting slice indeed alters which slices are labeled as discontinuities. However, the initial-slice mask and the corresponding memory-bank state co-vary accordingly. Thus, for any fixed initial-slice-mask prompt, the supervision applied to all subsequent slices remains consistent. In essence, we supervise the connected object—rather than enforcing full anatomical supervision on every slice—so there is no inconsistency in the applied supervision.
Q2: Stochastic retention of components lacks robustness analysis.
- Training: We randomly sample different starting slices during training, so discontinuity annotations vary across steps. Even if certain regions are marked discontinuous more often, the diverse sampling among regions ensures every region is eventually trained. We therefore do not expect stochastic retention to impede training.
- Inference: We acknowledge that a suboptimal initial-slice prompt at test time may omit some anatomical substructures. However, a targeted second-pass inference over the overlooked region can recover them. We will note this limitation in the camera-ready manuscript.
Response to Reviewer 3 Q1: Computation cost analysis. Using the paper’s notation—N candidate window sizes, k candidate memory banks. And using $N_{slice}$ to denote the number of slices in a 3D volume and $C_{module}$ to denote the cost of a module. The additional overhead of our method can be approximated as: $N_{slice}[kN(C_{MemAttn}+C_{MaskDec})+ kC_{MemEnc} - (C_{MemAttn} + C_{MaskDec} + C_{MemEnc})]$ In our experiments (N = k = 2), this corresponds to an inference time ≈2.2× that of the baseline (N = k = 1) under a naive implementation.
Q2: Mask prompt of the initial slice In our experiments, the initial-slice mask is taken directly from the ground truth (see Section 2.3). In practical settings, such a mask could be provided via any common prompt modality (e.g., points, bounding boxes).
Meta-Review
Meta-review #1
- Your recommendation
Provisional Accept
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A