Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Accurate medical image segmentation is crucial for precise anatomical delineation. Deep learning models like U-Net have shown great success but depend heavily on large datasets and struggle with domain shifts, complex structures, and limited training samples. Recent studies have explored diffusion models for segmentation by iteratively refining masks. However, these methods still retain the conventional image-to-mask mapping, making them highly sensitive to input data, which hampers stability and generalization. In contrast, we introduce DiffAtlas, a novel generative framework that models both images and masks through diffusion during training, effectively “GenAI-fying” atlas-based segmentation. During testing, the model is guided to generate a specific target image-mask pair, from which the corresponding mask is obtained. DiffAtlas retains the robustness of the atlas paradigm while overcoming its scalability and domain-specific limitations. Extensive experiments on CT and MRI across same-domain, cross-modality, varying-domain, and different data-scale settings using the MMWHS and TotalSegmentator datasets demonstrate that our approach outperforms existing methods, particularly in limited-data and zero-shot modality segmentation. Code is available at https://github.com/M3DV/DiffAtlas.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0228_paper.pdf

SharedIt Link: https://rdcu.be/eHxec

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05325-1_16

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/M3DV/DiffAtlas

Link to the Dataset(s)

https://huggingface.co/datasets/YuheLiuu/DiffAtlas_Preprocessed_Data

BibTex

@InProceedings{ZhaHan_DiffAtlas_MICCAI2025,
        author = { Zhang, Hantao AND Liu, Yuhe AND Yang, Jiancheng AND Guo, Weidong AND Wang, Xinyuan AND Fua, Pascal},
        title = { { DiffAtlas: GenAI-fying Atlas Segmentation via Image-Mask Diffusion } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {161 -- 172}
}

Reviews

Review #1

Please describe the contribution of the paper

The authors propose a diffusion-based approach that simulates atlas-based segmentation for medical image analysis. They claim that the method achieves strong performance in both same-domain and cross-domain settings, demonstrating improved generalizability compared to existing techniques.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper is well-written, and the review of related work is thorough and appropriate.
2. Extensive experiments in both same-domain and cross-domain settings demonstrate the method’s stability and superior performance compared to existing approaches.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The paper lacks sufficient methodological details, including model architecture, training setup, and how supervision is created. It is also unclear how the model jointly trains on the input–mask pairs. Including more figures and equations would greatly improve the clarity of the explanations.
2. The inference process is confusing. Specifically, the authors state that “the noisy image I_{t} is replace with noisy version version of the input image I_input”. It is unclear whether I_input here refers to I_{T}?
3. The explanation of the inference process is ambiguous. While it makes sense to add noise to the input image to simulate I_T, the construction of the “noisy mask” is unclear. If the noisy mask simply refers to the original mask plus noise, this seems unreasonable and lacks justification.
4. In my opinion, it is difficult to justify the claim that the proposed diffusion model simulates the atlas-based segmentation process. The method does not include explicit registration or label propagation, which are core components of atlas-based segmentation.
5. The authors claim strong generalizability of their approach; however, this is unconvincing as the method is only evaluated on heart CT or MRI datasets. Broader testing across more diverse modalities and anatomies would strengthen this claim.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Weakness 4 and 5.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper propose a new segmentation method, DiffAtlas, which combines the generated AI and the atlas paradigm, to jointly model the image-mask pair, and use the diffusion model to learn the anatomical consistency of the graph paradigm to solve the problem of feedforward network and explicit atlas paradigm, which ensures the anatomical consistency of the segmentation results, and also the accuracy of boundary and complex structure segmentation. Good generalization can be obtained with few-shot training.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Because of the joint modeling of image and mask, compared with the previous feedforward network paradigm which only input image to obtain mask, it is not too sensitive to input image and has strong generalization. Compared with the explicit atlas method, it not only does not rely on the establishment of specific atlas sets, but also can expand to different types of data. The combination of symbolic distance function for optimization can obtain better segmentation details in boundary and complex structure scenes, and it can adapt to different data with few-shot training.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The existing experimental results show good performance, but there are too few methods for comparison, which need to be appropriately increased. Moreover, only a few visual cases are not enough to show that the proposed paradigm has learned anatomical consistency. Maybe an indicator should be set to quantify it.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper combines generative artificial intelligence with the graph paradigm, demonstrating strong performance in few-shot learning and cross-domain settings, which is of great significance for scenarios with limited data. However, the number of comparison methods remains limited, and more baselines should be included to more comprehensively showcase the advantages of the proposed approach. Overall, the method is innovative and holds practical value, but there is still room for improvement in the comprehensiveness of experimental comparisons and the depth of result analysis. Therefore, a “weak accept” recommendation is given, and it is suggested that the authors further enhance the experimental setup and analysis in future work to improve the overall quality and persuasiveness of the paper.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have effectively addressed the primary concerns. They have performed additional baseline comparisons (with SwinUNETR) and report that their method, DiffAtlas, maintains superior performance, especially in few-shot and zero-shot scenarios. They commit to including these expanded results in the revision, which will significantly strengthen the paper’s experimental validation. Furthermore, the authors clarified their novel “GenAI-fied” atlas-based methodology, explaining how joint image-mask diffusion modeling learns anatomical consistency. They also provided key implementation details and promised code release, enhancing reproducibility. Given the innovative approach, its strong reported performance in data-limited settings, and the authors’ commitment to address the initial review’s main limitations, the paper is now recommended for Accept.

Review #3

Please describe the contribution of the paper

This paper proposes a novel idea that combines elements of image-to-mask mapping and atlas-based segmentation in a diffusion-based setting. Conventionally the image to mask mapping problem has been tackled extensively using neural networks, particularly using U-Nets. However, for medical images, this segmentation process may not always respect anatomical boundaries as no such priors are enforced. In this paper, the authors propose a generative framework that jointly models the image and the mask pair. Modeling image and mask pairs has been done before. But newly in this paper, the mask is obtained as a deformed version of the predefined labeled atlas to the image.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed idea of jointly modeling the image and atlas-propagated segmentation mask is novel.

The main idea is quite simple and conceptually straightforward. This is a strength of the paper. Other approaches have suggested modeling the conditional distribution of the image given the mask or even modeling the joint distribution of images and masks. Both these approaches have been cited in the paper. However, introducing the deformed mask as one of the elements of this joint distribution is novel and has not been proposed before. This is especially important for medical image segmentation, where the anatomical boundaries have specific meanings.

The mask is represented by a signed distance function and thus is quite general and flexible to obtain. Signed distance functions have been extensively used in image segmentation both under level-set variational frameworks and in a generative modeling setting (Bogensperger, Lea, et al. “Score-based generative models for medical image segmentation using signed distance functions.” DAGM German Conference on Pattern Recognition). An extended version of this method is also presented in Sauvalle, B., Salzmann, M.: Hybrid diffusion models: combining supervised and generative pretraining for label-efficient fine-tuning of segmentation models. arXiv:2408.03433 (2024), which the authors have cited.

Experimental results are extensive and demonstrate improved performance (Dice coefficients and normalized surface distance) of segmentation due to this approach. This method is compared on well-known datasets against state-of-the-art methods. Particularly in the few-shot (2, 4) training and in a cross-domain training-test setup, the proposed method yields a superior performance.

Visually, the segmentations appear to respect anatomical boundaries or at least appear regular. This is partly due to the atlas-based labeling approach, but also due to the representation by means of the signed distance function and any other regularizations carried out (both in terms of representation and registration).
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The paper does a good job of describing the overall approach and the main ideas. However, a few important details are missing.

The main idea of the paper is to use an atlas-propagated labeling mask. This mask is obtained by applying a warping field \phi to the atlas labels. The warping field in turn is obtained by solving a registration problem between the atlas-image pairs. In the training phase, the network minimizes the training loss given by the mean squared error between the predicted and true noise. However, implementation or experimental details about how the image and mask pairs are obtained is missing. Details about the registration methods used to obtain the atlas propagated mask are missing. Similarly, details about the atlases chosen for this task are missing. Further, both the registration algorithms as well as the atlases depend upon the application, i.e. the datasets used in the paper. However, the authors don’t mention what these choices are. This is the main weakness.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper proposes a novel idea, which is simple to understand and to implement. This is one of the key strengths. It extends the classically well-researched atlas-based segmentation approach in a generative setting. Even though the idea of using image pairs (input image and the segmentation masks) have been proposed before, they are only being recently looked at in the diffusion modeling setup. The idea of using a mask generated from the atlas, while straightforward and simple, has not been proposed or evaluated in the medical imaging segmentation application. However, a few key implementation details are missing in the paper, which need to be addressed by the authors.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Reviewer #3 also brought up the point about lack of implementation details. The authors have clarified some notes on implementation in their rebuttal. I feel that the idea is novel and the paper does make a good contribution.

Author Feedback

We thank reviewers for their thoughtful feedback and recognition of our method’s novelty (R1,R5), practical value（R1,R3,R5） and well-structured presentation (R1,R3,R5). We address their main concerns regarding the atlas-based process and other issues as follows:

R3, R5: Atlas-based segmentation process. We believe that the basic task of all atlas-based segmentation methods can be formulated as follows: Given a new image, find, and possibly adapt, the best matching image/mask pair from a set, which is very different from what conventional feedforward networks (e.g., UNet) do when directly predicting the mask from an image. At training time, we use the dataset’s image and mask pairs as supervision to train a diffusion model that can generate new, coherent image-mask pairs across different channels. This leverages the diffusion model’s ability to learn the joint distribution of image/mask pairs. As a result, the model achieves implicit registration through joint modeling. This is different from atlas-based approaches that rely on explicit registration algorithms but serve the same function. At inference time, the input image is used as guidance to progressively steer the generation process from pure noise toward the target image throughout the diffusion steps. This guided generation makes the corresponding mask follow the image, effectively indexing the desired (image, mask) pair. This serves the same end as the matching mechanism in atlas-based segmentation, albeit in a different way. Thus, our model fulfills the requirements for being an atlas-based algorithm, without requiring manual selection of atlases or registration algorithms because the training objective is to learn the overall image/mask distribution from the dataset. For comparison with atlas-based methods, which do require such selection, we followed the implementation details as described in the original papers. We will clarify this when revising the paper.

R3: writing: We apologize for omitting some implementation details due to space limitations, as our focus was on the paradigm shift — the core contribution. We will release the full code to ensure reproducibility. The model follows a standard UNet denoising architecture. The input image and SDF-encoded masks of different classes are fed into the diffusion model as separate channels. During generation, the model jointly synthesizes image–mask pairs. We use 300 DDPM steps with a learning rate of 1e-4 and train until convergence (typically ~20,000 steps for large segmentation datasets). For the input image I_input, we add noise corresponding to timestep t to obtain a noisy version, replacing the denoised result at each step with it. The “noisy mask” refers to the intermediate masks produced during the diffusion process. We apologize for the earlier ambiguity and will clarify when revising. We sincerely thank the reviewer for the valuable feedback.

R1,R3: Broader Baseline Coverage and Comparison For baselines, we selected one representative SOTA method from each category, including feedforward models like nnU-Net. We used the TotalSegmentator (TS) and MM-WHS datasets (CT and MRI), constructing cross-domain settings: different anatomical regions (TS → MM-WHS) and modalities (CT → MRI, MRI → CT). Unlike most prior work, our method performs zero-shot inference on new domains, making the task more challenging and the approach more generalizable.

Nevertheless, to further strengthen our case, we have performed additional comparisons with a new baseline, SwinUNETR, under each configuration. To comply with MICCAI’s policy against submitting new results in the rebuttal, we respectfully state that our method still consistently outperforms all baselines—often by a large margin, especially in zero-shot cross-domain and few-shot scenarios.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

DiffAtlas: GenAI-fying Atlas Segmentation via Image-Mask Diffusion

Author(s):