
Denoising diffusion models offer a promising approach to accelerating magnetic resonance imaging (MRI) and producing diagnostic-level images in an unsupervised manner. However, our study demonstrates that even tiny worst-case potential perturbations transferred from a surrogate model can cause these models to generate fake tissue structures that may mislead clinicians. The transferability of such worst-case perturbations indicates that the robustness of image reconstruction may be compromised due to MR system imperfections or other sources of noise. Moreover, at larger perturbation strengths, diffusion models exhibit Gaussian noise-like artifacts that are distinct from those observed in supervised models and are more challenging to detect. Our results highlight the vulnerability of current state-of-the-art diffusion-based reconstruction models to possible worst-case perturbations and underscore the need for further research to improve their robustness and reliability in clinical settings.

Review #1

    The paper explore the vulnerability of diffusion-based re- construction models to possible worst-case perturbations for improving their robustness and reliability in clinical settings.

    This paper discusses a crucial issue in medical imaging: the impact of distortion noise on the performance of generative models, specifically the diffusion model. While diffusion model excels at detailed image generation, its accuracy under real-world perturbations remains uncertain. By investigating the model’s behavior under various noise levels, this study provides valuable insights for clinicians and researchers striving for accurate imaging results.

    -The writing of this paper is very poor, lacking a clear introduction to the research content. The introduction fails to outline the research motivation and objectives. A substantial portion of the text is dedicated to discussing related work (Section 2), while only a limited section describes the research design (Section 3). -Although the research has important clinical implications, the experimental design and results are not convincing. As an exploratory study, it should be conducted on multiple datasets and with various diffusion models, considering the multiple variants already available. -The experimental results are notably thin, utilizing only one quantitative metric, and the improvements in visualization are not clearly evident.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.The goal of this paper is to explore and analyze existing technologies, i.e., diffusion models, which is meaningful for advancing the development and application of diffusion models. However, the overall work lacks original insights that provoke thought. The authors should discuss the impact of perturbations based on the principles behind MRI reconstruction and diffusion models.

    1. Some concept definitions in the text are very confusing. In fact, diffusion models also require supervised training; what does an unsupervised diffusion model refer to? Concepts like Worst-Case, white-box, and black-box are not clearly defined.
    2. The experiments are not convincing, only comparing with ResUnet and I-RIM. It is unclear why these two methods were chosen, as there are many CNN and diffusion methods in MRI reconstruction tasks. 4.The paper needs better writing to enhance readability.
    Reject — should be rejected, independent of rebuttal (2)

    This work is not sufficiently solid. There are significant flaws in both the methodology and experimental design, and the quality of the manuscript does not yet meet the standard for publication.

    Confident but not absolutely certain (3)

    Weak Accept — could be accepted, dependent on rebuttal (4)

    The authors have addressed my concerns.

Review #2

    Denoising diffusion models can provide high-quality MRI scan reconstruction. However, some perturbations can lead the diffusion models to generate fake tissue structures. The robustness of the diffusion models is important in clinical settings. Therefore, the paper highlights the vulnerability of diffusion models to worst-case perturbations from the MRI scans. The paper evaluates different supervised and unsupervised reconstruction models, and uses gradient-based PGD attacks to generate white-box and black-box perturbations and tests the robustness of the trained reconstruction models.

    The quantitative and qualitative results are very interesting. They show that small perturbations can cause the diffusion models to generate fake tissue structures.

    1. On page 7 Section “Worst-case instabilities of supervised models”, it mentions Subplots c and d of Fig. 3, but there are no subplots c and d in Fig.3.

    2. The differences between Fig 3.a and Fig 3.b are not described clearly.

    3. The paper doesn’t discuss the profound reasons for the instabilities of the diffusion model and possible future work about improving the robustness of the reconstruction models.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Refer to the weaknesses. The paper is suggested to have clarifications about the details and give deeper reasons.

    Weak Accept — could be accepted, dependent on rebuttal (4)

    The research finding is very interesting and should be valuable for the field.

    Confident but not absolutely certain (3)

Review #3

    The paper conducted the study on the transferability of worst-case perturbation on denoising diffusion models. The results highlights the venerability of the SOTA diffusion methods.

    1) Paper is clearly written 2) The idea of robustness of DDPM is significant and novel, which with a impact on general medical community. 3) Evaluation is solid, the quantitative and qualitative results are convincing.

    1) The supervised baselines are a bit outdated. It is recommended to add state-of-the-art reconstruction methods for comparison. For example, “Gao, Zhifan, et al. “Hierarchical perception adversarial learning framework for compressed sensing MRI.” IEEE Transactions on Medical Imaging (2023).” 2) If authors can involve more than one single dataset, the findings will be further consolidated and be more convincing.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) The supervised baselines are a bit outdated. It is recommended to add state-of-the-art reconstruction methods for comparison. For example, “Gao, Zhifan, et al. “Hierarchical perception adversarial learning framework for compressed sensing MRI.” IEEE Transactions on Medical Imaging (2023).” 2) If authors can involve more than one single dataset, the findings will be further consolidated and be more convincing. 3) The font size in Figure 1 is too small. 4) It is not clear why only 80% dataset was used, and how these data were selected.

    Accept — should be accepted, independent of rebuttal (5)

    The authors studied an important problem on the instability of unsupervised diffusion model for MRI reconstruction. The paper was clearly written and has potential to generate a greater impact on the medical community, thus I recommend accept for this paper.

    Somewhat confident (2)

    Accept — should be accepted, independent of rebuttal (5)

    Though thorough validation will be needed to futhur consolidate the finding, I believe the study is generally beneficial for the community.

Author Feedback

We thank the reviewers for their detailed feedback and constructive comments on our manuscript. Below, we address the major concerns raised by the reviewers.

Reviewer #1

Major Critique: Figure 3 caption and Lack of Clarity

Response: We apologize for the oversight regarding subplots c and d on page 7. This was a typographical error (it should be Fig. 3a and Fig. 3b), and we will correct it. The revised caption now reads: “Fig. 3: We visualized the impact of perturbation amplitude on model performance, measured by the ΔSSIM metric. Subplot (a) shows that all models experienced a drastic drop in SSIM as the perturbation amplitude increased using worst-case perturbations generated by i-RIM. Similar findings were observed with adversarial perturbations via the ResUnet model, in (b).”

Major Critique: Discussion on Model Instabilities

Response: We appreciate this comment. We have added a discussion about the possible reasons for the observed instabilities in diffusion models. The revised discussion reads: “Our study suggests that worst-case perturbations in model-based MRI reconstruction can transfer to the independently trained diffusion model. The main reason for this vulnerability is that the perturbed K-space misleads the reverse iterative diffusion process, creating nonphysical artifacts. Classical regularization techniques like total variance regularization might offer better robustness in such scenarios.”

Reviewer #4

Major Critique: Font Size in Figures and Dataset Utilization

Response: We have revised Figure 1 to increase the font size for better readability. Using 80% of the dataset for training and validation follows a standard 80-20 train-validation split, ensuring robust model evaluation. We have clarified this point in the manuscript.

Reviewer #5

Major Critique: Poor Writing and Lack of Clear Introduction

Response: We will thoroughly revised the manuscript to improve clarity and readability. The introduction now reads: “Magnetic Resonance Imaging (MRI) is essential for medical diagnostics, especially for brain diseases, due to its detailed, non-invasive imaging capabilities. However, MRI faces challenges like long acquisition times and high sensitivity to motion. Recent advancements, particularly denoising diffusion models, promise to accelerate MRI by reconstructing high-quality images from undersampled data. Unlike traditional methods, these models can operate without paired training data. However, our study reveals a critical vulnerability: susceptibility to minimal worst-case perturbations, leading to significant inaccuracies in reconstructed images. Our research explores the robustness of diffusion models in MRI reconstruction, investigating adversarial perturbations and proposing strategies to enhance resilience. We aim to advance reliable diffusion models in clinical settings.”

Major Critique: Confusing Concept Definitions and Limited Comparisons Response: We evaluate all experiments using both SSIM (structural similarity index measure) and pSNR (peak signal-to-noise ratio), common metrics for evaluating image reconstruction. Due to limited space, we included the pSNR evaluation in the supplement (Fig. S1). Additional visualizations are included in Fig. S2.

Major Critique: Unclear Definitions

Response: The term unsupervised in our study means that paired undersampled MR images and their ground truth are not needed. We follow the usage of “unsupervised reconstruction” as proposed by Song, Yang, et al. 2023. We will clarify all concepts mentioned by the reviewer in the supplement.

Major Critique: Reconstruction Baselines Are Not Convincing

Response: We selected a Unet-based baseline (ResUnet++) as it is the most widely used CNN backbone in MRI image reconstruction. Our next selection, i-RIM, showed extraordinary success in the FastMRI challenge.


Meta-review #1

    The authors have done a good work on rebuttals.

    The authors have done a good work on rebuttals.

Meta-review #2

    Reviewers raised their rankings after the rebuttal. Overall interesting insight, though I am missing a solution to overcome this instability.

    Reviewers raised their rankings after the rebuttal. Overall interesting insight, though I am missing a solution to overcome this instability.

