Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

The Diffusion Probabilistic Model (DPM) has demonstrated remarkable performance across a variety of generative tasks. The inherent randomness in diffusion models helps address issues such as blurring at the edges of medical images and labels, positioning Diffusion Probabilistic Models (DPMs) as a promising approach for lesion segmentation. However, we find that the current training and inference strategies of diffusion models result in an uneven distribution of attention across different timesteps, leading to longer training times and suboptimal solutions. To this end, we propose UniSegDiff, a novel diffusion model framework designed to address lesion segmentation in a unified manner across multiple modalities and organs. This framework introduces a staged training and inference approach, dynamically adjusting the prediction targets at different stages, forcing the model to maintain high attention across all timesteps, and achieves unified lesion segmentation through pre-training the feature extraction network for segmentation. We evaluate performance on six different organs across various imaging modalities. Comprehensive experimental results demonstrate that UniSegDiff significantly outperforms previous state-of-the-art (SOTA) approaches.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1865_paper.pdf

SharedIt Link: https://rdcu.be/eHwOC

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04937-7_63

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/HUYILONG-Z/UniSegDiff

Link to the Dataset(s)

N/A

BibTex

@InProceedings{HuYil_UniSegDiff_MICCAI2025,
        author = { Hu, Yilong AND Chang, Shijie AND Zhang, Lihe AND Tian, Feng AND Sun, Weibing AND Lu, Huchuan},
        title = { { UniSegDiff: Boosting Unified Lesion Segmentation via a Staged Diffusion Model } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {663 -- 673}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper presents a segmentation method based using a diffusion model. The denoising process of the diffusion model is separated by three stages, which are “Rapid segmentation stage”, “Probabilistic modeling stage”, and “Denoising refinement stage”. Denoising stages include attention-based modules.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Unlike normal diffusion models, the denoising models are separated by the stages of the denoising processes, which are “Rapid segmentation stage”, “Probabilistic modeling stage”, and “Denoising refinement stage”. By using different models depending on the denoising process, the authors argue that the learning efficiency and the overall prediction accuracies were improved.

Uncertainty estimation is incorporated to the system.

Experimental comparison with SOTA methods and aberration studies were shown.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The 2nd paragraph of introduction seems to include important claims of the paper, however, no evidence or examples are shown.

No qualitative evaluation is shown. Thus, it is not easy for the readers to grasp how the method is working. Moreover, how the uncertainty estimation can be shown to the user is not clear.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Although quantitative evaluation is shown, no resultant images are shown. Thus, I think the readers find it difficult how the algorithm is working.

Also, although uncertainty estimation is incorporated to the system, no visual information for the result uncertainty is shown.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The paper introduces UniSegDiff, a novel diffusion model framework for unified lesion segmentation across multiple modalities and organs. During inference, the rapid segmentation and denoising refinement stages perform single-step sampling, achieving accurate segmentation in as few as eleven steps. This approach is at least 10 times faster than DDIM and 100 times faster than DDPM. The paper demonstrates the effectiveness of UniSegDiff through comprehensive experiments and ablation studies, showing its potential as a unified framework for lesion segmentation across various medical imaging modalities.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The problem is important and methodologically (and potentially clinically) for creating data or modality agnostic segmentation methods.
2. The problem and proposed approach are well motivated and Figure 1 provides a very good general overview of the results obtained by the method compared to those in the state of the art.
3. The state of the art is comprehensive enough: it provides a good understanding of the limitations of previous approaches based on denoising for segmentation.
4. The experimental design and the test done by the authors are very thorough. They performed experiments on several large datasets and compared their method with state of the art models. The study also includes ablation studies to assess the impact of different denoising methods, different components of the architecture and the threshold selection.
5. The results are well presented and the authors present both quantitative and qualitative assessments of the experiments, presenting ablation studies in great detail. The methodology is evaluated on six different lesion segmentation tasks across various imaging modalities, demonstrating its versatility and effectiveness in unified lesion segmentation. The use of 4-fold cross-validation adds robustness to the evaluation process
6. The authors discuss future areas of research and the limitations of the current iteration of their approach.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The staged training and inference approach, while effective, may introduce additional complexity to the model. This could potentially increase computational costs and training time, despite the claimed efficiency improvements
2. The use of a unified model for multiple lesion types and imaging modalities could potentially lead to overfitting on the specific datasets used in the study. The paper doesn’t discuss how well the model generalizes to completely new datasets or lesion types
3. While the study uses six different lesion segmentation datasets, this may still not be comprehensive enough to claim true ‘unified’ lesion segmentation across all possible lesion types and imaging modalities
4. Although the paper claims faster inference compared to some other methods, the need for multiple samplings and uncertainty fusion might still result in longer inference times compared to simpler, task-specific models
5. The method relies on specific thresholds for different stages, but it’s not clear how sensitive the model is to these threshold values or how they should be selected for new datasets
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Utilizing advanced data augmentation techniques to artificially expand the dataset could help. This might include simulating various lighting conditions, adding realistic noise, or generating synthetic images that mimic real kidney environments.

Developing a training regime that combines both phantom and real data in carefully calibrated proportions could help the system learn generalizable features while still benefiting from the controlled nature of phantom data

Could the authors elaborate more on the need for separate object detection and segmentation models? Is segmentation not enough?
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This approach aims at mitigating issues related to current training and inference strategies for diffusion models, which result in uneven attention distribution across timesteps, leading to longer training times and suboptimal solutions. This paper presents a method to overcome these issues, through a staged training and inference approach, which dynamically adjusts prediction targets at different stages. This ensures high attention across all timesteps

The idea of unifying lesion segmentation by pre-training the feature extraction network for segmentation is very interesting. The results clearly show that this helps to transform lesion images from different modalities into distributions similar to the masks, reducing feature confusion between different lesions.

The authors show that leveraging the inherent randomness modeled by the diffusion model through staged inference and fusion of multiple segmentation results can lead to optimal solutions.

However, more details about training time and generalization are needed for a strong accept.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The aurhors have addressed my concerns. I believe the paper is well written and thorough.

Review #3

Please describe the contribution of the paper

The authors introduce a refined, stage-based training procedure for diffusion probabilistic modelling aimed at lesion segmentation. Additionnally, authors make use of a pre-trained imaging features encoder for conditional training guidance through a dual cross attention mechanism. An inference-only fusion mechanism between different segmentations was used to propose a final segmentation mask consensus. A thourough benchmarking against SOTA methods was done on pathology datasets. An ablation study was performed to evaluate the contribution of each novel mechanism on the overall lesion segmentation performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors introduced a novel training and inference scheme for diffusion models based on a three stage approach, combined with conditional imaging features guidance and a pre-trained encoder. They empirically showed that they obtained the best overlap performance comapred to experts, when benchmarked with previous SOTA methods. The inference time and training time of the proposed method is also quite competitive compared to other approaches. The ablation studies highlight the significance of each novel implemented component.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The introduction section could be quite improved. A large part of the introduction is spent discussing the motivation behind the staged training approach. No references to similar work was introduced. Additionnally, the motivation behind using diffusion probabilistic models for lesion segmentation is quite unclear. A deeper litterature review in the introduction is needed, regarding the use of pre-trained encoders, the use of DPMs for organs-at-risk or lesions segmentation. Although the authors highlight the crucial contribution of pre-training the encoder to their method leading to superior performance, it is unclear if it is the main performance factor. Based on table 6, what would happen to the model performance if it includes all novel components except the imaging encoder pre-training? How would it affect the final results?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors proposed a novel lesion segmentation baseline leading to state-of-the-art results for lesion segmentation. They have several major contributions, from training and inference scheme to the incorporation of pre-trained encoder and segmentation masks fusion. However, their introduction needs a stronger litterature review of previous works related to diffusion probabilistic models. Other external datasets for pathology segmentation could also be included for benchmarking, to study the generalizability of the model. The focus here seems to be robustness to different modalities, and not specifically lesions/pathologies diversity.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

R3Q1 Similar work and R3Q2 Motivation for using diffusion The authors comment clarified previous inquiries about introduction references and the motivation behind using DPMs. However, the second part of the introduction seem to belong in the approach section rather than the introduction.

R3Q4 Pretrained encoder As far as I can tell, the authors propose numerous contributions through the paper: staged training, pre-training, Dual Cross Attention (DCA) and segmentation Fusion. From table 6, the authors investigate the impact of each component, in a cumulative manner. Although this is informative to a degree, it would also would have been interesting to see the impact of all new mechanisms introduced except pre-training specifically. Design improvements such as staged training, DCA and segmentation fusion are all design improvements, whereas pre-training is fundamentally quite different, more keen to domain adaptation. In the future, it would be best to treat improvements such as pre-training separate from other novelties. Each component needs to be evaluated independently, in the ablation study, as it would quite informative for the rest of the research community.

The authors propose a thorough study of a novel application of DPMs for lesion segmentation, with numerous ablation studies and methodological improvements. The introduction could be quite improved in terms of clarity and structure, and the impact factor of each novel addition in respect to one another is hard to grasp. I would like to change my decision from weak reject to weak accept.

Author Feedback

We sincerely thank the reviewers for your insightful and constructive comments. R1Q1 Cost and training time Under the same model, staged training reduces the convergence epochs (Table 4). Moreover, our model has fewer parameters then other diffusion-based models (Ours 46M vs cDAL 73M / Medsegdiff 129M / SDSeg 854M), leading to lower computational cost and shorter training time (Table 7). R1Q2 Overfitting and generalization Table 3 shows that switching from separate training to unified training leads to consistent performance changes across all datasets, which indicates no overfitting to specific ones. Currently, the generalization focuses on using a single set of model parameters to achieve segmentation of various lesion data. Our future work will explore generalization in few-shot and zero-shot settings. R1Q3 Principle of unification Our unified approach is data-scalable. As shown in Table 3, our method remains effective when extended from single-dataset training to unified training across six lesion modalities, with no significant drop in two evaluation metrics. The key is to leverage a segmentation-pretrained feature extraction network to map lesions from different distribution spaces into the same distribution space as the mask, thereby unifying the data distribution and enabling unified segmentation. R1Q4 Inference time Although multiple samplings and uncertainty fusion has no advantage in FPS than simple, task-specific models (Ours 8.95 vs RollingUNet 10.8 / EMCAD 34.9), it improves segmentation accuracy, which is important. R1Q5 Thresholds The noise is applied to the mask, and masks across different datasets are composed of 0s and 1s, maintaining a highly consistent format. Therefore, the choice of threshold is minimally affected by the dataset. Table 5 presents an ablation study on thresholds. R1-Additional comments Thanks for the valuable suggestion, providing inspiration for our future research. The current work focuses on a segmentation model and does not involve additional object detection models.

R2Q1 Evidence The top-left corner of Figure 1 shows average gradient trends over time steps from experiments, which supports the evidence behind our motivation. The green line shows that predicting different targets allows the model to maintain high attention across all timesteps. This attributes to our staged design, which combines the advantages of predicting noise and x0 at different stages. R2Q2 No qualitative evaluation Due to space limitations, visual results are omitted. We will add them in the final paper.

R3Q1 Similar work “Improved Techniques for Training Score-Based Generative Models”(Song et al, NeurIPS2020) first pointed out that the noise intensity at different timesteps can affect the generation performance. Then IDDPM (Alex et al, ICML2021) proposed a cosine noise scheduling to improve generation quality. Following SegDiff [2], which introduced conditional diffusion into the segmentation field, encoding images as conditions through a network has become standard, as used in MegSegDiff [27] and cDAL [13]. We further pre-trained it for unified segmentation. Recently, “Analysing Diffusion Segmentation for Medical Images”(Mathias et al, arXiv2024) noted the limitations of noise scheduling for segmentation but no solution was proposed. We analyzed the relationship between noise scheduling and the prediction target and further proposed staged diffusion. We will provide more discussions in the final paper. R3Q2 Motivation for using diffusion Lesion boundaries in medical images are often ambiguous and lack a gold standard for masks. By performing random sampling at each inference step, diffusion models can generate multiple segmentation results, which can then be fused to enhance accuracy, similar to how multiple doctors jointly annotate images. R3Q4 Pretrained encoder Comparing rows 2 and 3 in Table 6 shows the performance gain from pretraining. Removing pretraining in row 5 will reduce mDice by 3.1.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

This work has two positive reviewers and a negative reviewers. After checking the comments and rebuttals, I agree with positive reviewers to accept this work.

back to top

UniSegDiff: Boosting Unified Lesion Segmentation via a Staged Diffusion Model

Author(s):