Abstract

Denoising diffusion models offer a promising approach to accelerating magnetic resonance imaging (MRI) and producing diagnostic-level images in an unsupervised manner. However, our study demonstrates that even tiny worst-case potential perturbations transferred from a surrogate model can cause these models to generate fake tissue structures that may mislead clinicians. The transferability of such worst-case perturbations indicates that the robustness of image reconstruction may be compromised due to MR system imperfections or other sources of noise. Moreover, at larger perturbation strengths, diffusion models exhibit Gaussian noise-like artifacts that are distinct from those observed in supervised models and are more challenging to detect. Our results highlight the vulnerability of current state-of-the-art diffusion-based reconstruction models to possible worst-case perturbations and underscore the need for further research to improve their robustness and reliability in clinical settings.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1977_paper.pdf

SharedIt Link: https://rdcu.be/dV5Eq

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72104-5_49

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1977_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Han_On_MICCAI2024,
        author = { Han, Tianyu and Nebelung, Sven and Khader, Firas and Kather, Jakob Nikolas and Truhn, Daniel},
        title = { { On Instabilities of Unsupervised Denoising Diffusion Models in Magnetic Resonance Imaging Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {509 -- 517}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper explore the vulnerability of diffusion-based re- construction models to possible worst-case perturbations for improving their robustness and reliability in clinical settings.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper discusses a crucial issue in medical imaging: the impact of distortion noise on the performance of generative models, specifically the diffusion model. While diffusion model excels at detailed image generation, its accuracy under real-world perturbations remains uncertain. By investigating the model’s behavior under various noise levels, this study provides valuable insights for clinicians and researchers striving for accurate imaging results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -The writing of this paper is very poor, lacking a clear introduction to the research content. The introduction fails to outline the research motivation and objectives. A substantial portion of the text is dedicated to discussing related work (Section 2), while only a limited section describes the research design (Section 3). -Although the research has important clinical implications, the experimental design and results are not convincing. As an exploratory study, it should be conducted on multiple datasets and with various diffusion models, considering the multiple variants already available. -The experimental results are notably thin, utilizing only one quantitative metric, and the improvements in visualization are not clearly evident.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1.The goal of this paper is to explore and analyze existing technologies, i.e., diffusion models, which is meaningful for advancing the development and application of diffusion models. However, the overall work lacks original insights that provoke thought. The authors should discuss the impact of perturbations based on the principles behind MRI reconstruction and diffusion models.

    1. Some concept definitions in the text are very confusing. In fact, diffusion models also require supervised training; what does an unsupervised diffusion model refer to? Concepts like Worst-Case, white-box, and black-box are not clearly defined.
    2. The experiments are not convincing, only comparing with ResUnet and I-RIM. It is unclear why these two methods were chosen, as there are many CNN and diffusion methods in MRI reconstruction tasks. 4.The paper needs better writing to enhance readability.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work is not sufficiently solid. There are significant flaws in both the methodology and experimental design, and the quality of the manuscript does not yet meet the standard for publication.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have addressed my concerns.



Review #2

  • Please describe the contribution of the paper

    Denoising diffusion models can provide high-quality MRI scan reconstruction. However, some perturbations can lead the diffusion models to generate fake tissue structures. The robustness of the diffusion models is important in clinical settings. Therefore, the paper highlights the vulnerability of diffusion models to worst-case perturbations from the MRI scans. The paper evaluates different supervised and unsupervised reconstruction models, and uses gradient-based PGD attacks to generate white-box and black-box perturbations and tests the robustness of the trained reconstruction models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The quantitative and qualitative results are very interesting. They show that small perturbations can cause the diffusion models to generate fake tissue structures.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. On page 7 Section “Worst-case instabilities of supervised models”, it mentions Subplots c and d of Fig. 3, but there are no subplots c and d in Fig.3.

    2. The differences between Fig 3.a and Fig 3.b are not described clearly.

    3. The paper doesn’t discuss the profound reasons for the instabilities of the diffusion model and possible future work about improving the robustness of the reconstruction models.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Refer to the weaknesses. The paper is suggested to have clarifications about the details and give deeper reasons.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The research finding is very interesting and should be valuable for the field.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper conducted the study on the transferability of worst-case perturbation on denoising diffusion models. The results highlights the venerability of the SOTA diffusion methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) Paper is clearly written 2) The idea of robustness of DDPM is significant and novel, which with a impact on general medical community. 3) Evaluation is solid, the quantitative and qualitative results are convincing.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The supervised baselines are a bit outdated. It is recommended to add state-of-the-art reconstruction methods for comparison. For example, “Gao, Zhifan, et al. “Hierarchical perception adversarial learning framework for compressed sensing MRI.” IEEE Transactions on Medical Imaging (2023).” 2) If authors can involve more than one single dataset, the findings will be further consolidated and be more convincing.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) The supervised baselines are a bit outdated. It is recommended to add state-of-the-art reconstruction methods for comparison. For example, “Gao, Zhifan, et al. “Hierarchical perception adversarial learning framework for compressed sensing MRI.” IEEE Transactions on Medical Imaging (2023).” 2) If authors can involve more than one single dataset, the findings will be further consolidated and be more convincing. 3) The font size in Figure 1 is too small. 4) It is not clear why only 80% dataset was used, and how these data were selected.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors studied an important problem on the instability of unsupervised diffusion model for MRI reconstruction. The paper was clearly written and has potential to generate a greater impact on the medical community, thus I recommend accept for this paper.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Though thorough validation will be needed to futhur consolidate the finding, I believe the study is generally beneficial for the community.




Author Feedback

We thank the reviewers for their detailed feedback and constructive comments on our manuscript. Below, we address the major concerns raised by the reviewers.

Reviewer #1

Major Critique: Figure 3 caption and Lack of Clarity

Response: We apologize for the oversight regarding subplots c and d on page 7. This was a typographical error (it should be Fig. 3a and Fig. 3b), and we will correct it. The revised caption now reads: “Fig. 3: We visualized the impact of perturbation amplitude on model performance, measured by the ΔSSIM metric. Subplot (a) shows that all models experienced a drastic drop in SSIM as the perturbation amplitude increased using worst-case perturbations generated by i-RIM. Similar findings were observed with adversarial perturbations via the ResUnet model, in (b).”

Major Critique: Discussion on Model Instabilities

Response: We appreciate this comment. We have added a discussion about the possible reasons for the observed instabilities in diffusion models. The revised discussion reads: “Our study suggests that worst-case perturbations in model-based MRI reconstruction can transfer to the independently trained diffusion model. The main reason for this vulnerability is that the perturbed K-space misleads the reverse iterative diffusion process, creating nonphysical artifacts. Classical regularization techniques like total variance regularization might offer better robustness in such scenarios.”

Reviewer #4

Major Critique: Font Size in Figures and Dataset Utilization

Response: We have revised Figure 1 to increase the font size for better readability. Using 80% of the dataset for training and validation follows a standard 80-20 train-validation split, ensuring robust model evaluation. We have clarified this point in the manuscript.

Reviewer #5

Major Critique: Poor Writing and Lack of Clear Introduction

Response: We will thoroughly revised the manuscript to improve clarity and readability. The introduction now reads: “Magnetic Resonance Imaging (MRI) is essential for medical diagnostics, especially for brain diseases, due to its detailed, non-invasive imaging capabilities. However, MRI faces challenges like long acquisition times and high sensitivity to motion. Recent advancements, particularly denoising diffusion models, promise to accelerate MRI by reconstructing high-quality images from undersampled data. Unlike traditional methods, these models can operate without paired training data. However, our study reveals a critical vulnerability: susceptibility to minimal worst-case perturbations, leading to significant inaccuracies in reconstructed images. Our research explores the robustness of diffusion models in MRI reconstruction, investigating adversarial perturbations and proposing strategies to enhance resilience. We aim to advance reliable diffusion models in clinical settings.”

Major Critique: Confusing Concept Definitions and Limited Comparisons Response: We evaluate all experiments using both SSIM (structural similarity index measure) and pSNR (peak signal-to-noise ratio), common metrics for evaluating image reconstruction. Due to limited space, we included the pSNR evaluation in the supplement (Fig. S1). Additional visualizations are included in Fig. S2.

Major Critique: Unclear Definitions

Response: The term unsupervised in our study means that paired undersampled MR images and their ground truth are not needed. We follow the usage of “unsupervised reconstruction” as proposed by Song, Yang, et al. 2023. We will clarify all concepts mentioned by the reviewer in the supplement.

Major Critique: Reconstruction Baselines Are Not Convincing

Response: We selected a Unet-based baseline (ResUnet++) as it is the most widely used CNN backbone in MRI image reconstruction. Our next selection, i-RIM, showed extraordinary success in the FastMRI challenge.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have done a good work on rebuttals.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors have done a good work on rebuttals.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Reviewers raised their rankings after the rebuttal. Overall interesting insight, though I am missing a solution to overcome this instability.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Reviewers raised their rankings after the rebuttal. Overall interesting insight, though I am missing a solution to overcome this instability.



back to top