List of Papers Browse by Subject Areas Author List
Abstract
Contrast-enhanced magnetic resonance images (CEMRIs) provide valuable information for brain tumor diagnosis and treatment planning. However, CEMRI acquisition requires contrast agent injection, which poses problems such as health risks, high costs, and environmental concerns. To address these drawbacks, researchers have synthesized CEMRIs from non-contrast magnetic resonance images (NCMRIs) to remove the need for contrast agents. However, CEMRI synthesis from NCMRIs is highly ill-posed, where false positive and false negative enhancement can be produced, especially for brain tumors. In this study, we propose a deformation-driven diffusion model (D3M) for CEMRI synthesis with brain tumors from NCMRIs. Instead of modeling enhancement errors as intensity errors, we formulate them as incorrect interpretation of tumor subcomponents, where enhanced tumors are misinterpreted as non-enhanced tumors and vice versa. In this way, the enhancement can be geometrically corrected with spatial deformation. This reduces the difficulty of CEMRI synthesis, as the intensity error is usually large to correct whereas the geometry correction is relatively small. Specifically, we first introduce a multi-step spatial deformation module (MSSDM) in D3M. MSSDM performs image deformation to adjust the enhancement, displacing enhanced regions to remove false positive and false negative enhancement. Moreover, as the denoising process of diffusion models is stepwise, MSSDM is applied at these multiple diffusion steps. Second, to further guide the spatial deformation, we incorporate an auxiliary task of segmenting the enhanced tumor, which aids the model understanding of contrast enhancement. Accordingly, we introduce a dual-stream image-mask decoder (DSIMD) that jointly produces intermediate enhanced images and masks of enhanced tumors. Results on two public datasets demonstrate that D3M outperforms existing methods in CEMRI synthesis.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1331_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/PangHaowen-hub/D3M
Link to the Dataset(s)
BraSyn dataset: https://www.synapse.org/Synapse:syn53708249/wiki/627507
BraTS-PEDs dataset: https://www.synapse.org/Synapse:syn53708249/wiki/627505
BibTex
@InProceedings{PanHao_D3M_MICCAI2025,
author = { Pang, Haowen and Zhang, Peng and Hong, Xiaoming and Chen, Shannan and Ye, Chuyang},
title = { { D3M: Deformation-Driven Diffusion Model for Synthesis of Contrast-Enhanced MRI with Brain Tumors } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15975},
month = {September},
page = {150 -- 160}
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper proposes D3M (Deformation-Driven Diffusion Model), a novel framework for synthesizing contrast-enhanced MRI (CEMRI) of brain tumors from non-contrast MRI (NCMRI) using a diffusion-based generative approach. The key innovation lies in modeling enhancement errors not as intensity mismatches, but as spatial misinterpretations of tumor subcomponents, which are then corrected through learned spatial deformations during the diffusion process.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
-
The paper reframes enhancement synthesis errors as spatial misinterpretations of tumor subcomponents rather than purely intensity-based errors, introducing a new perspective on contrast-enhanced MRI synthesis.
-
The introduction of the Multi-Step Spatial Deformation Module (MSSDM) and Dual-Stream Image-Mask Decoder (DSIMD) within the diffusion model is novel, enabling spatial refinement of enhancing tumor regions.
-
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The authors argue that enhancement errors in CEMRI synthesis arise from misinterpretation of tumor subcomponents rather than simple intensity mismatches, and thus propose spatial deformation as a remedy. However, this distinction is not well justified: modern generative models (GANs, diffusion, transformers) inherently model high-level structural understanding, and all synthesis errors can be viewed as failures in semantic interpretation. The framing used in this paper does not constitute a new modeling principle but rather rephrases a common phenomenon. Moreover, the proposed deformation module does not enhance semantic understanding but acts as a post hoc spatial correction, which is effective only if the prior prediction is approximately correct. Therefore, the theoretical motivation appears overstated.
- The author mentioned that “This geometric perspective reduces the difficulty of CEMRI synthesis, as intensity errors for enhancement are typically large and challenging to correct, whereas geometric correction is relatively small and more manageable”, which is not validated theoretically and experimentally.
- The proposed spatial deformation module (MSSDM) closely resembles prior work in unsupervised deformable registration (e.g., VoxelMorph), as it estimates a deformation field from predicted masks to spatially align enhancement regions. While framed as novel, this is essentially a reapplication of existing ideas in the diffusion setting. The paper lacks discussion of this connection and does not clarify how the method differs from or improves upon existing registration-based refinement approaches.
- The author validated the proposed method on two brain MRI datasets. It is unclear whether the approach generalizes to other anatomical structures, pathologies, or MRI sequences.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(2) Reject — should be rejected, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The theoretical motivation of this paper is to be discussed, as I mentioned in the weakness. Moreover, the deformation mechanism acts as a form of weakly supervised registration, yet the paper does not discuss this connection or compare against relevant baselines. Overall, the methodological novelty is overstated, and several claims lack rigorous experimental support.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
While I appreciate the authors’ efforts to respond to my concerns, I find that their rebuttal does not adequately address several of the key issues I raised, particularly regarding the theoretical novelty, conceptual framing, and distinction from prior work.
On the framing of synthesis errors as “misinterpretation of tumor subcomponents”: The authors maintain that their model goes beyond intensity correction by integrating spatial deformation. However, this reframing remains a semantic reinterpretation of a well-known phenomenon—namely, the semantic ambiguity in contrast-enhanced MRI synthesis. Most modern generative models, including GANs and diffusion models, implicitly handle spatial structure and intensity jointly. The proposed reformulation does not introduce a fundamentally new modeling principle; instead, it applies a standard deformation-based correction guided by predicted masks, which has been explored in other contexts.
On the nature of the deformation module (MSSDM): The authors argue that their method is not a post-hoc correction because MSSDM is interleaved with denoising steps. This is a procedural distinction, not a methodological one. Importantly, my original critique referred not to post-processing registration, but to registration-based refinement models that are trained end-to-end—jointly optimizing deformation fields and image synthesis in an iterative fashion (related studies include 1. Joint synthesis and registration network for deformable MR-CBCT image registration for neurosurgical guidance by R Han, etc. 2. JSSR: A Joint Synthesis, Segmentation, and Registration System for 3D Multi-Modal Image Alignment of Large-scale Pathological CT Scans by Fenze Liu etc.). Such models also use intermediate segmentations or features to drive deformation within the generation pipeline. The rebuttal fails to acknowledge or compare against this relevant body of work, which significantly weakens the claimed novelty.
On the claimed advantage of geometric correction over intensity correction: The authors restate their assumption that geometric deformation requires smaller adjustments than intensity-based correction, but provide no theoretical formulation or meaningful ablation (e.g., convergence behavior, loss curvature, error landscape analysis) to support this claim. Reporting downstream performance improvements is insufficient to justify the core hypothesis that “geometric correction reduces synthesis difficulty.” The ablation only confirms effectiveness, not theoretical superiority.
Review #2
- Please describe the contribution of the paper
The paper proposes a novel diffusion model, termed Deformation-Driven Diffusion Model (D³M), for synthesizing contrast-enhanced T1-weighted MRI (CEMRI) with brain tumors from non-contrast MRI sequences (T1, T2, FLAIR). The core idea is to reframe the challenge of correcting false positive/negative enhancement errors not as direct intensity manipulation, but as a geometric correction problem. This reduces the difficulty of CEMRI synthesis as the intensity error is large to correct whereas geometry correction is relatively small.
Key contributions include:
- Spatial Deformation-Driven Enhancement Correction of enhancement errors
- Multi-Step Spatial Deformation Module (MSSDM): A module integrated into the diffusion model’s reverse process that estimates a deformation field at multiple diffusion steps to spatially adjust enhancing regions, correcting errors.
- Dual-Stream Image-Mask Decoder (DSIMD): An assistive component within the diffusion network that jointly predicts the intermediate image and a segmentation mask of the enhanced tumor. This mask guides the MSSDM in estimating the appropriate deformation.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Novel Formulation of Synthesis Error: The core idea of treating enhancement errors (false positives/negatives) as geometric misinterpretations correctable by deformation, rather than purely intensity errors, is novel and simplifies the correction task.
Integration of Geometric Correction into Diffusion: The proposed MSSDM module applies learned deformations at multiple steps within the diffusion reverse process and is a technically strong and novel integration of geometric deformation into generative diffusion models.
Auxiliary Task Guidance: Using an auxiliary enhanced tumor segmentation task via the DSIMD to explicitly guide the deformation process improves the performance and stability of the model.
Strong Empirical Performance: The method demonstrates superior performance compared to several relevant baselines on two challenging public datasets. The quantitative results in Table 1 show statistically significant improvements in PSNR and SSIM, particularly within the tumor regions. Qualitative results also visually demonstrate reduced false positive/negative enhancement in tumor areas.
Well-Designed Ablation Study
Clinical Relevance: Addressing the need to synthesize CEMRI to potentially reduce gadolinium usage is highly clinically relevant due to concerns about cost, health risks, and environmental impact
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Given the complexity of the framework there is no information shared on the training times and computational cost involved
Dependence on accurate enhanced tumor masks during the auxillary step which can be a bottleneck on the overall training performance. The paper does not mention if a quality control method is used to assess the accuracy and quality of the tumor masks.
The model operates on 2D slices, which are then concatenated. This approach might miss some 3D contextual information that could be beneficial for complex tumor structures.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Table 2: Ablation studies on the BRATS-PEDS dataset would have been interesting to see.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper presents a novel and technically interesting approach (D³M) to a clinically significant problem: synthesizing CEMRI with brain tumors to potentially avoid using contrast agents such as gadolinium. The core idea of reframing enhancement errors as geometric misinterpretations and integrating multi-step spatial deformation within a diffusion model is innovative. The proposed components -MSSDM, DSIMD- are well-explained and contribute to the method’s effectiveness. Overall the paper has a novel formulation, strong empirical results demonstrating statistically significant improvements over relevant baselines (especially in tumor regions) and clear presentation.
While there are weaknesses especially related to the reliance on training masks, and concatenation of 2D slices, these do not fundamentally undermine the core contribution. This paper is a solid contribution suitable for acceptance at MICCAI.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The authors mention “the curation of the annotation and its quality control are feasible, as the enhanced region usually has a relatively small volume and its annotation is not as labor-intensive as the annotation of whole tumors.”
The point in my review was to understand how much the quality of the region affects the downstream evaluation. Even if the tumor is small, would even any marginal decrease in quality lead to a less accurate output thereby making the proposed approach less robust ?
I feel the paper is fine as it is for now. For its novelty and the discussions it can bring into the community, I am giving it a final rating of ‘Accept’
Review #3
- Please describe the contribution of the paper
The paper introduces three key innovations: (1) a novel reformulation of enhancement errors as geometric misinterpretations rather than intensity errors, enabling correction via spatial deformation; (2) a multi-step spatial deformation module (MSSDM) that iteratively refines enhancement at each diffusion step; and (3) a dual-stream image-mask decoder (DSIMD) that jointly predicts enhanced images and tumor masks to improve anatomical consistency. These contributions work synergistically to address the ill-posed nature of CEMRI synthesis, particularly for brain tumors, by combining geometric correction with diffusion-based refinement and anatomical guidance.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper presents several notable strengths: (1) its novel formulation treats enhancement errors as deformable misalignments rather than simple intensity discrepancies, offering an original geometric perspective that simplifies correction; (2) the integration of segmentation guidance through tumor mask supervision enhances anatomical plausibility in the synthesized images; (3) the method demonstrates strong empirical performance, outperforming both GAN-based and diffusion-based approaches on two public datasets (BraSyn, BraTS-PEDs); and (4) it provides clinically feasible solutions by eliminating gadolinium-based contrast agents, thereby addressing important safety and cost concerns in medical imaging. These combined advantages make the approach both technically innovative and practically valuable for medical applications.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Limited comparison with recent diffusion-based MRI synthesis methods, such as Zhou et al. (2024) [29], which also uses cascaded diffusion for medical translation. No ablation on the choice of deformation model: The paper assumes U-Net is optimal for deformation estimation but does not justify this against alternatives (e.g., transformer-based deformable networks).
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- Justification: The method is novel, well-validated, and clinically impactful. However, comparisons with recent diffusion models could be stronger.
- Confidence: High (reviewer is familiar with medical image processing and diffusion models).
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
R1: Does the dependence on accurate enhanced tumor masks pose a bottleneck to training? Was quality control used to assess the tumor masks?
Although our method depends on annotation masks for training, the dependence does not substantially restrict the application of our method as a bottleneck. First, the datasets in our experiment are all publicly available and have provided manual annotations of enhanced regions, which have been reviewed by experts before release for quality control. Second, even for a new dataset, the curation of the annotation and its quality control are feasible, as the enhanced region usually has a relatively small volume and its annotation is not as labor-intensive as the annotation of whole tumors. R2: The motivation appears overstated as 1) existing methods also have high-level structural understanding and 2) the proposed deformation acts as post-hoc spatial correction that is effective only when the prior prediction is approximately correct.
While modern generative models inherently model structural understanding, our explicit modeling of spatial deformation is still different from these methods. Existing models still correct synthesis errors from an intensity perspective, which is challenging as the intensities of enhanced and nonenhanced regions are very different. Insufficient correction will manifest as misunderstanding of tumor subcomponents. Whereas our method allows integrated spatial and intensity correction, which better addresses the difference between enhanced and nonenhanced regions.
Note that our method is not merely post-hoc correction. Instead, MSSDM interacts with image generation alternately. The procedure in Fig. 1 only represents one step in the denoising process. The MSSDM output will impact image generation in subsequent denoising steps and thus improve the prior predictions in these steps as well. In addition, modern generative models are capable of producing reasonable prior predictions across most regions as they have high-level understanding of tumor components, and major errors concentrate in localized details that our method can address, as shown in Fig. 2.
R2: Theoretical or experimental evidence supporting that the geometric perspective reduces the synthesis difficulty.
Theoretically, nonenhanced and enhanced tumor regions are adjacent, and the geometric correction only requires a small deformation modification to the image, e.g., pulling nonenhanced regions to false positive enhanced regions via deformation and vice versa. However, as nonenhanced and enhanced regions are associated with very different intensities, i.e., hypo- and hyper-intensity, respectively, intensity correction requires a large modification to the image. The smaller image modification for geometric correction reduces the synthesis difficulty.
Experimentally, the effectiveness of MSSDM (geometric correction) is supported by the ablation study in Table 2. The removal of MSSDM leads to noticeable performance degradation. R2: MSSDM resembles registration-based post-processing refinement. What are the differences between them?
First, unlike registration-based refinement, where deformation is applied as post-processing, MSSDM is embedded directly within the denoising process and performed alternately with image generation at different denoising steps. The interaction between MSSDM and image generation avoids error accumulation that can be severe and too large to correct for registration-based post-processing.
Second, the image generation in subsequent denoising steps also benefits from geometric correction in previous steps. Their joint learning allows optimization of both generation and correction, whereas in post-processing registration the image generation is not learned to produce the final optimal result.
Third, registration methods require a target reference to produce the deformation field, but such target is not available in our framework and MSSDM does not require such target.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper proposes to reframe contrast enhancement errors in MRI synthesis as geometric misinterpretations rather than intensity mismatches. The authors integrate spatial deformation into the diffusion process, yielding strong performance on two public datasets.
While R2 raises concerns about the theoretical novelty and resemblance to deformable registration frameworks, these points are counterbalanced by solid empirical performance, clear methodological innovation within the diffusion setting, and potential for impact. Both R1 and R3 recognize the clinical relevance, architectural innovation, and clear writing. The rebuttal addresses concerns effectively, clarifying how MSSDM interacts with generation steps and does not merely perform post-hoc correction.
Given the originality of reformulating synthesis errors in a geometrically interpretable manner and the model’s statistically significant gains over baselines, I believe this paper will stimulate valuable discussion in the community and merits acceptance.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A