Abstract

Vision foundation model, despite strong segmentation capabilities enabled by pretraining on large-scale data, remain underexplored in specific medical visual concept segmentation tasks. Medical imaging presents unique challenges: pixel intensity differences between target regions and surrounding structures are often subtle, and significant variations in the shape, size, and location of anatomical structures limit the effectiveness of traditional pixel-similarity-based alignment strategies. This paper proposes a Deformation-Aware Learning Strategy via Self-sustaining Feedback Cycle (DSFC) for medical image segmentation. The framework introduces a dual-deformation perturbation mechanism, combining global Gaussian-distributed deformations and target-focused local deformations, to preserve anatomical patterns while capturing non-rigid variations. A Hard Example Adaptive (HEA) loss is proposed to enhance training stability and mask accuracy. DSFC establishes a closed-loop training process, alternately optimizing the segmentation model and destroyer to improve anatomical understanding. Our extensive experiments on public datasets with various dimensions, organs demonstrate that DSFC significantly enhances model performance in fully supervised training settings without the need for increasing the samples. and its components are effective. Our code will be publicly available.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1616_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/jaylinio/DSFC

Link to the Dataset(s)

BTCV dataset: https://www.synapse.org/Synapse:syn3193805/wiki/217752 Synapse dataset: https://www.synapse.org/Synapse:syn3193805/wiki/89480 JSRT dataset: http://db.jsrt.or.jp/eng.php

BibTex

@InProceedings{LinJie_DSFC_MICCAI2025,
        author = { Lin, Jie and Jiang, Hengyi and Liu, Hong and Wang, Liansheng},
        title = { { DSFC: Deformation-Aware Learning Strategy via Self-sustaining Feedback Cycle for Medical Vision Foundation Model Domain Adaptation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {174 -- 184}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a novel framework for adapting large vision foundation models to the medical domain. It introduces a destroyer module that applies both global and local deformations to the input images to improve model robustness. Additionally, it incorporates a hard example adaptive loss to emphasize learning from difficult examples. The proposed method is evaluated on several public datasets and demonstrates superior performance compared to multiple SAM-based baselines.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper presents a well-structured framework for adapting foundation models in medical imaging. The destroyer module effectively models both global and local uncertainties. The HEA loss further reinforces the model’s ability to focus on hard regions.
    2. Extensive experiments are conducted on multiple public datasets, and the proposed method is compared against a wide range of SAM-based baselines.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The statement “σ_D is sample from the Gaussian distribution of the organ region 𝑦′” is ambiguous. It is unclear what distribution is being referred to or how it is derived from the organ region 𝑦′.
    2. The local destroyer’s alpha matte 𝑚 is supervised using binary cross-entropy loss against the ground truth mask. It is unclear how this setup enables the model to learn spatial deformations.
    3. While the paper compares against a comprehensive set of SAM-based methods, it would benefit from including classical medical segmentation models such as nnU-Net as additional baselines,
    4. The paper introduces an augmentation probability 𝑝 but does not clearly explain how it is applied. Is 𝑝 used per sample, per batch, or per organ class?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please address the concerns in the weakness section.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have clarified major concerns in the rebuttal.



Review #2

  • Please describe the contribution of the paper

    The authors introduce DSFC (Deformation-Aware Learning Strategy via Self-sustaining Feedback Cycle), a novel training framework for improving the adaptation of vision foundation models like MedSAM-2 to different challenging downstream medical image segmentation tasks with limited supervision. DSFC employs a closed-loop training process where a Refiner (segmentation model) and a Destroyer are alternately optimized. The Destroyer introduces synthetic deformations through (1) a global Gaussian-sampled deformation field to simulate inter-patient variation, and (2) a learned local deformation field predicted by a lightweight U-Net to model localized prediction challenges. To further enhance robustness, a Hard Example Adaptive (HEA) loss is used to emphasize low-confidence foreground pixels during training. Without requiring additional labeled data, DSFC improves segmentation performance across multiple datasets (BTCV, Synapse multi-organ CT, JSRT), demonstrating its ability to capture both local and global structural variations relevant to anatomical segmentation.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The Dual-path deformation design is interesting and clearly separates global anatomical variability and local prediction challenges enabling more effective robustness training for segmentation models.
    2. Self-supervised local deformation learning requires no extra labels, making the framework data-efficient and adaptable.
    3. Incorporation of L_HEA provides a simple yet intuitive way to focus learning on uncertain regions, with its confidence-based weighting approach well-aligned with the goal of robust segmentation under ambiguity.
    4. Results show significant improvement in segmentation performances over SAM baselines
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. In my opinion, the training details provided in the paper are not enough. For example, I assume Refiner (R) and Destroyer (D) are trained episodically but it is unclear how many epochs they are trained before alternating. A fixed n:m schedule or adaptive switching strategies used could help understand stability and convergence better.
    2. While the Hard Example Adaptive (HEA) loss is claimed to be novel, it is essentially a hard mining technique. It uses a hard threshold (τ) to select difficult pixels, but it is unclear whether this approach offers benefits at all over a probabilistic reweighting mechanism for example like focal loss variants or why such probabilistic variants of loss have not been explored.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper presents a thoughtful integration of existing techniques such as deformation-based augmentation, cyclic training, and hard example mining into a cohesive framework tailored to the challenges of medical image segmentation. While the overall conceptual novelty is incremental, the method is well-motivated and the integration pipeline is novel. It also demonstrates strong empirical performance. Some methodological details are under-explained, and the manuscript contains typographical issues, but these can be addressed in a rebuttal.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    My main concerns were on some of the training details and on the use of the L_HEA loss. The authors have addressed the concerns and have provided their observations on how L_HEA differs from and is better than focal loss for their use case.



Review #3

  • Please describe the contribution of the paper

    This work introduces a closed loop training process that alternates the optimization between global and local features to reach an improved segmentation model.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Alternating between the global and local drestory process is a neat idea to optimize the segmentation for both the global and local features.
    • The proposed HEA loss is a good strategy to address the pixels that are hard to segment properly.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Using the original segmentation to train the local destroy process is likely to bias this process to what the basic/original segmentation model produces.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Major:

    1. “Local destroy process”: Guiding the local destroyer to learn the deformation between the original segmentation and the ground truth is not very intuitive, as it simply biases the deformation towards the original segmentation.
    2. What is the difference between the 2D and 3D implementation of the proposed model?

    Minor:

    1. Table 1 caption: what are the blue, yellow and brown in this table?!
    2. P7: “Since the trends in the ablation experiments were similar across the three datasets, we selected the BTCV dataset for detailed illustration” but then Fig 4 shows all the datasets with no details. Is there any missing results?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea is neat, and the results are comprehensive with good ablation experiments that validate the proposed idea.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The author addressed my comments in a satisfactory way.




Author Feedback

We thank the reviewers for 1) appreciating our work’s novelty, clarity, and efficacy, and 2) the constructive comments. Below we address the main concerns and will update the manuscript accordingly in our final version.

R1Q1 Potential bias from using original segmentation results in training the Local Destroyer module As discussed on page-4, simulating local variations with random deformation fields is challenging, so we propose using local differences between original segmentation results and the GT. This approach requires no additional intervention. While data-driven DL methods, including ours, may carry some bias from the original segmentation model, our experiments show their effectiveness. This is due to the alternate optimization process, where the segmentation network and destroyer are updated in turns, introducing randomness and diversity. This interaction prevents overfitting to early biases and enhances generalizability. Additionally, as the Refiner improves, the Destroyer focuses on finer local errors, further boosting performance.

R1Q2 Clarify the difference between 2D&3D implementations Both versions follow a unified pipeline with consistent training procedures and loss functions. The 3D implementation uses 3D U-Net for the Destroyer, and generates 3D deformation fields. Local perturbations are applied to whole 3D volumes rather than 2D slices to better preserve anatomical structures.

R1Q3~Q4 Missing legend for Tab. 1 colors and missing details in Fig. 4 We will reintroduce a clear legend in the final version.

R2Q1 Alternate optimization strategy We experimentally adopt a 1:1 alternate optimization strategy (see Algo. 1), where the Destroyer and Refiner are updated in even and odd iterations, respectively.

R2Q2 Clarification of HEA Loss and comparison with Focal Loss The novelty of HEA Loss lies in selectively emphasizing uncertain foreground regions. It combines confidence maps from the Refiner and anatomical priors to dynamically weight low-confidence foreground pixels. HEA Loss is integral to our destroy-recover training loop: the Destroyer introduces perturbations to simulate challenging cases, and HEA Loss focuses optimization on these uncertain areas, enhancing robustness in “worst-case” scenarios. This approach differs from Focal Loss, which statically reweights all pixels based on prediction probability. Preliminary results show that HEA Loss outperforms Focal Loss (gamma=2) in Dice score across multiple organs in BTCV.

R4Q1 Definition and sampling of σ_D from the region y′ In the global destroy module, σ_D is computed to match the spatial extent of target organs. For each training sample, we extract foreground coordinates from the organ mask y′ and compute σ_x and σ_y along x and y axes. The σ_D is defined as their average: σ_D = (σ_x + σ_y) / 2. Displacement vectors (dx, dy) are then sampled from a Gaussian N(0, σ_D²), smoothed, and clipped (e.g., within [−20, 20]) to ensure anatomical plausibility.

R4Q2 How BCE Loss on alpha matte m guides spatial deformation learning The alpha matte m, inspired by alpha blending in [see Ref 8], highlights regions with mask prediction errors. It is supervised by BCE Loss, the gradient is backpropagated jointly with the deformation field. Therefore, even with supervision only on alpha matte m, the deformation field can be aligned accordingly — a strategy commonly used in medical image registration.

R4Q3 Suggestion to include nnU-Net as an additional baseline While we focused on comparing with recent foundation modes due to space limits, we agree that nnU-Net is a valuable baseline. Preliminary results demonstrate that our method outperforms nnU-Net across multiple benchmark datasets.

R4Q4 Clarify how augmentation probability p is applied The augmentation probability p is applied per sample within a batch, not per organ or batch. Results in a mix of original and augmented samples during Refiner training.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Three reviewers recommended acceptance of the paper following rebuttal, citing a novel training framework, effective design for enhancing segmentation robustness, and strong empirical results across multiple datasets.

    Reviewer #1 highlighted the paper’s central idea of a closed-loop training process that alternates between global and local deformation paths to improve segmentation. The proposed Hard Example Adaptive (HEA) loss was noted as a practical addition for targeting difficult regions. While the reviewer initially raised concerns about potential bias introduced by using the base model’s outputs in training the local path, as well as ambiguities in architectural details and result reporting, the rebuttal satisfactorily addressed these issues. The reviewer ultimately endorsed acceptance for the method’s originality and thorough validation.

    Reviewer #2 commended the DSFC framework for its dual-path deformation strategy, data efficiency, and well-integrated HEA loss. The method was viewed as a well-motivated and effective way to adapt vision foundation models like MedSAM-2 for medical segmentation under limited supervision. Concerns about insufficient training detail (e.g., scheduling of refiner/destroyer updates) and limited exploration of alternative loss functions (e.g., focal loss) were acknowledged, but the rebuttal clarified these aspects. The reviewer maintained a weak accept, emphasizing the method’s practical impact and performance.

    Reviewer #3 supported acceptance after initial hesitation, citing the framework’s ability to model both global anatomical variability and local prediction uncertainty through the destroyer module, and its strong results against SAM-based baselines. Key concerns around ambiguity in notation, unclear supervision mechanisms, and omission of classical segmentation baselines (e.g., nnU-Net) were addressed in the rebuttal. With clarifications provided, the reviewer endorsed acceptance.

    In summary, the paper was accepted for its novel and well-structured training strategy that improves medical image segmentation robustness, strong cross-dataset performance, and clear contributions in the adaptation of vision foundation models to clinical domains. Remaining concerns were considered addressable in revision.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top