Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Leveraging the powerful capabilities of diffusion models has yielded quite effective results in medical image segmentation tasks. How- ever, existing methods typically transfer the original training process di- rectly without specific adjustments for segmentation tasks. Furthermore, the commonly used pre-trained diffusion models still have deficiencies in feature extraction. Based on these considerations, we propose LEAF, a medical image segmentation model grounded in latent diffusion models. During the fine-tuning process, we replace the original noise prediction pattern with a direct prediction of the segmentation map, thereby re- ducing the variance of segmentation results. We also employ a feature distillation method to align the hidden states of the convolutional layers with the features from a transformer-based vision encoder. Experimen- tal results demonstrate that our method enhances the performance of the original diffusion model across multiple segmentation datasets for different disease types. Notably, our approach does not alter the model architecture, nor does it increase the number of parameters or computa- tion during the inference phase, making it highly efficient.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2956_paper.pdf

SharedIt Link: https://rdcu.be/eHdSW

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04978-0_37

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Pearisli/LEAF

Link to the Dataset(s)

N/A

BibTex

@InProceedings{HuaQil_LEAF_MICCAI2025,
        author = { Huang, Qilin AND Lin, Tianyu AND Chen, Zhiguang AND Zheng, Fudan},
        title = { { LEAF: Latent Diffusion with Efficient Encoder Distillation for Aligned Features in Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {384 -- 393}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes an image segmentation method based on latent diffusion models, with two key differences from prior diffusion-based approaches in medical image segmentation: (1) instead of predicting the noise as in the standard formulation, the U-Net in the latent diffusion model directly predicts the latent representation; (2) intermediate features of the U-Net are aligned with those of a pre-trained vision encoder (DinoV2), which enhances performance. The method is evaluated on four datasets and demonstrates improved results over existing approaches.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper tackles an important problem using state of the art models and techniques.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Unfortunately, the paper gives the impression that both (i) predicting the latent representation instead of noise, and (ii) feature alignment, are novel contributions. This is misleading. Predicting latent representations was already proposed in Ho et al., 2020 (cited as [8] in the paper), and the issue of high variance in noise prediction was discussed in Dynamic Dual-Output Diffusion Models (CVPR 2022) and Progressive Distillation for Fast Sampling of Diffusion Models (Salimans & Ho, 2022). None of these works are acknowledged in the Introduction. As a result, a junior researcher unfamiliar with prior work may incorrectly assume this is a novel contribution. Similarly, while feature alignment is cited in Section 2.2 (via the REPA paper [26]), it is not mentioned in the Introduction, again giving the impression that it is part of the authors’ contributions.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Due to the issues outlined above, I find the Introduction of this paper to be potentially misleading. By omitting key prior work and presenting previously published ideas as novel contributions, the Introduction risks misinforming readers—particularly those unfamiliar with the literature. This lack of proper attribution and contextualization is concerning and, in my view, detrimental to scientific discourse. For this reason, I recommend rejection.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

Unfortunately, this paper overlooks key prior work and presents ideas that have already been published as novel contributions. As a result, the Introduction may mislead readers—especially those less familiar with the existing literature. The lack of proper attribution and contextualization is concerning and, in my view, undermines scientific discourse. While the authors acknowledge these issues and indicate that they will address them, I believe the concerns are substantial enough to warrant a thorough re-review. Therefore, I recommend rejection at this stage.

Review #2

Please describe the contribution of the paper

This paper presents a novel method for medical image segmentation tasks based on diffusion models. To this end, the paper improves the prediction with a more stable method, and uses feature alignment for enriching latent representations.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is improving segmentation tasks with an original method, combining multiple aspects like shared loss terms and feature alignment. The writing is clear and flows well for the most part. Moreover, the paper shows its source code, which I personally appreciate a lot.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Experimental results are hard to judge since standard deviations are missing. Given that the performance numbers are relatively close to each other, it is not possible to contextualise the proposed performance improvements of the paper vis-à-vis other, simpler methods.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I appreciate the originality of the method and the clarity of the writing, but I am less convinced by the experimental setup at the moment.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Thanks for the clarifications.

Review #3

Please describe the contribution of the paper

In this study, the authors fine-tuned a latent diffusion model for medical image segmentation and proposed a high-performing framework named LEAF.

Unlike traditional diffusion models that predict the noise added at each time step, the proposed model is designed to directly predict the target data.

Furthermore, the authors introduce a novel distillation loss function that compares the similarity between features extracted from a transformer-based visual encoder and those obtained within LEAF’s denoising U-Net. This provides additional feature representations during segmentation, leading to enhanced performance.

As a result, the method improves segmentation performance across various medical datasets without incurring additional temporal costs compared to existing approaches.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This study achieves high performance in diffusion-based medical image segmentation through two key contributions: 1)It effectively incorporates the design from prior work [1], in which the diffusion model directly predicts data in the latent space instead of predicting added noise. 2) It introduces a novel distillation loss that compares features extracted from a visual encoder applied to medical images with those extracted by the segmentation network itself, thereby integrating feature representation into the segmentation task. [1] Lin, Tianyu, et al. “Stable diffusion segmentation for biomedical images with single-step reverse process.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024.

The proposed model is evaluated on various medical datasets and achieves state-of-the-art segmentation performance across all benchmarks.

The effectiveness of the two main contributions is further supported by two ablation studies, each corresponding to one of the proposed components. Overall, the paper is well-structured and supported by clear figures and tables that illustrate the proposed methodology and results effectively.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Although the pre-loss corresponding to Eq. (4) can be inferred from conventional diffusion models and related work [1], it is not clearly stated whether this loss is implemented as an L1 or L2 loss. While readers with the background may infer the function of the loss, the paper lacks an explicit explanation or justification for the choice. [1] Lin, Tianyu, et al. “Stable diffusion segmentation for biomedical images with single-step reverse process.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024.

In Table 1, which presents the main segmentation results, most of the baselines are cited directly from previous studies. Although the authors mention that the original results were reused for fair comparison, it would be more rigorous to re-run these models under the same experimental setup to avoid potential biases and ensure a more accurate performance comparison.

In Table 3, which investigates the effect of varying the weight (gamma) of the proposed distillation loss, there is no consistent trend showing that higher gamma values lead to improved segmentation performance. This raises questions about the robustness and sensitivity of the proposed loss function and warrants further analysis or clarification.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

In the experimental results evaluating the proposed distillation loss, particularly in Table 3, it is observed that the QaTa dataset—where the largest performance gain is reported—shows an interesting pattern: the segmentation performance improves as gamma increases from 0 to 0.5, but then drops when gamma is set to 0.75 or 1. This raises a question about whether the authors aimed to find an optimal gamma value rather than demonstrating consistent improvements with increasing gamma strength. If an optimal gamma was indeed selected based on validation performance, it would be helpful to explicitly state this in the paper. Otherwise, further explanation is needed regarding why stronger distillation (i.e., larger gamma) leads to performance degradation. Additionally, it would be informative to discuss whether this pattern remains consistent across different random seeds, or if performance differences at higher gamma values could be mitigated through multiple runs and averaging.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The core idea of the paper is sufficiently novel, and both the quantity and quality of the experiments support the proposed method well. I have only one concern, which I have detailed above—specifically, regarding the segmentation performance variations in relation to the gamma values of the proposed distillation loss. This concern forms the basis for my overall score.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Since the variation in values with respect to gamma is minimal, and providing the standard deviation is crucial for ensuring the robustness of the experiments, it is important to include this information. Although the experimental evidence is currently insufficient, the authors have expressed their intention to clarify this point. Therefore, I recommend a weak accept, but it is clear that further experimentation is needed.

Author Feedback

We sincerely thank all reviewers and the constructive comments.

Reviewer#1 Q1: By omitting key prior work and presenting previously published ideas as novel contributions, the introduction risks misinforming readers. A1: Thanks for your comment. We apologize for not adequately highlighting the related prior work in the introduction, which caused misunderstanding. We will clarify them in the revised version. However, we still want to clarify that our contribution is different from the existing methods:

The works you mention focus on variance reduction in noise prediction and x0 sampling for natural image generation. In contrast, our fine tuning framework is explicitly designed for medical image segmentation, which requires dense prediction and consistency.

While Dynamic Dual Output Diffusion Models interpolate between noise and x0 outputs based on learned coefficients, and Progressive Distillation for Fast Sampling of Diffusion Models reports only marginal gains from different parameterizations, our ablation study shows that x0-pred yields significantly better segmentation accuracy and stability in medical image segmentation compared to noise-pred.

The REPA employs a Transformer encoder to align features within a DiT based model for natural image generation. By contrast, our approach aligns convolutional and Transformer representations, and we demonstrate that even without fine tuning on medical data, this visual encoder improves representation quality across diverse backbone architectures in segmentation tasks.

In summary, beyond merely incorporating natural image diffusion techniques, our work offers novel insights and empirical evidence that this fine tuning framework is both innovative and effective for medical image segmentation. We believe these targeted contributions will meaningfully guide future research in this specialized domain.

Reviewer#2 Q1: Experimental results lack standard deviations, making it hard to assess significance, and performance gains versus simpler methods are unclear. A1: We follow previous works (MedSegDiff and SDSeg) therefore did not shows standard deviation in Table 1, but we report the standard deviation of noise pred and x0 pred over 10 seeds in Table 4, and the results show that x0 pred exhibits low variance.

Because SDSeg uses noise prediction, these statistics also apply to it. Moreover, as described in Section 3.2, we re trained and evaluated SDSeg—our closest performing baseline—within our exact framework (identical architecture, data splits, preprocessing, and hyperparameters) to minimize bias and ensure a robust comparison. Finally, all experiments across the four datasets use the same global seed without dataset specific tuning, further reinforcing the reliability of our results.

Reviewer#3 Q1: Clarify if gamma was chosen optimally or intended to show consistent gains. A1: We performed a grid search over gamma to select the optimal value; the results in Table 1 are simply the rounded best scores from Table 3, and we will clarify this in the revised version.

Q2: Does the gamma performance pattern hold across different random seeds, or can averaging mitigate drops. A2: We have evaluated all gamma values across five different random seeds and will report mean and standard deviation in the revision. We found that the performance drop at higher gamma values is due to random seed variability rather than gamma itself. After sufficient training, the mean and standard deviation of performance across seeds converges closely for all gamma. Averaging results over multiple gamma values produces stable, robust outcomes, demonstrating the reliance of our distillation loss.

Q3: The diffusion loss type is not specified. A3: We adopt the L1 loss exactly as implemented in SDSeg, and will explicitly clarify this choice.

Q4: More rigorously re run baselines under identical setup. A4: Thank you for the suggestion, we will pay attention to this in our future work.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

I recommend rejecting this paper due to significant concerns about the novelty and necessity of the proposed approach. While the authors combine existing techniques (latent diffusion with x0 prediction and feature alignment), the fundamental question remains whether diffusion models are necessary or advantageous for medical image segmentation compared to more established and efficient methods. The paper fails to adequately justify why the added complexity and computational overhead of diffusion models is warranted for this task, especially when simpler approaches may achieve comparable results with better efficiency and interpretability.

back to top

LEAF: Latent Diffusion with Efficient Encoder Distillation for Aligned Features in Medical Image Segmentation

Author(s):