Abstract

Hallucinations are spurious structures not present in the ground truth, posing a critical challenge in medical image reconstruction, especially for data-driven conditional models. We hypothesize that combining an unconditional diffusion model with data consistency, trained on a diverse dataset, can reduce these hallucinations. Based on this, we propose DynamicDPS, a diffusion-based framework that integrates conditional and unconditional diffusion models to enhance low-quality medical images while systematically reducing hallucinations. Our approach first generates an initial reconstruction using a conditional model, then refines it with an adaptive diffusion-based inverse problem solver. DynamicDPS skips early stage in the reverse process by selecting an optimal starting time point per sample and applies Wolfe’s line search for adaptive step sizes, improving both efficiency and image fidelity. Using diffusion priors and data consistency, our method effectively reduces hallucinations from any conditional model output. We validate its effectiveness in Image Quality Transfer for low-field MRI enhancement. Extensive evaluations on synthetic and real MR scans, including a downstream task for tissue volume estimation, show that DynamicDPS reduces hallucinations, improving relative volume estimation by over 15\% for critical tissues while using only 5\% of the sampling steps required by baseline diffusion models. As a model-agnostic and fine-tuning-free approach, DynamicDPS offers a robust solution for hallucination reduction in medical imaging. Code is available at \url{https://github.com/edshkim98/DynamicDPS}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4526_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/edshkim98/DynamicDPS

Link to the Dataset(s)

N/A

BibTex

@InProceedings{KimSeu_Tackling_MICCAI2025,
        author = { Kim, Seunghoi and Tregidgo, Henry F. J. and Figini, Matteo and Jin, Chen and Joshi, Sarang and Alexander, Daniel C.},
        title = { { Tackling Hallucination from Conditional Models for Medical Image Reconstruction with DynamicDPS } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15963},
        month = {September},

}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a new approach that first warms-start with any conditional IQT model, then applies Data-Consistency-Aware Time Selection and a pre-trained unconditional diffusion model (DynamicDPS) to cut inference time and suppress hallucinations in MRI enhancement. DynamicDPS notably achieves significant improvements in image fidelity and robustness in both synthetic and real MRI datasets, validating clinical feasibility, and demonstrating superior performance in a downstream clinical application (tissue volume estimation).

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper proposes a new approach that first warms-start with any conditional IQT model, then applies Data-Consistency-Aware Time Selection and a pre-trained unconditional diffusion model (DynamicDPS) to cut inference time and suppress hallucinations in MRI enhancement. DynamicDPS implicitly tackled the OOD hallucinations by the unconditional diffusion prior (capturing a broader HF-MRI distribution) plus an explicit data-consistency term to correct the conditional prediction at test time. DynamicDPS significantly reduces inference time (by over 80%) compared to conventional diffusion-based approaches (e.g., DPS), making it practically viable for clinical integration without excessive computational costs.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Key Weaknesses

    1. Simplified Forward Model – Low-field (LF) images are simulated with blur + down-sampling + γ-transform rather than a realistic k-space or signal-physics model; data-consistency is enforced only in image space, limiting clinical realism.
    2. Ad-hoc Data-Consistency Loss – The ℓ₂ + Sobel-Edge + SSIM term lacks probabilistic grounding; λ-weight selection is undocumented, raising stability and reproducibility concerns.
    3. Opaque DCATS Implementation – Construction of the reference-likelihood bank, temperature τ tuning, and computational overhead are unspecified, making DCATS hard to reproduce or evaluate.
    4. Unvalidated Time-Skipping Strategy – Jumping directly to an intermediate diffusion time  breaks the original SDE/ODE derivation; no theoretical or empirical bias analysis is provided.
    5. Wolfe Line-Search on Non-convex Objective – Armijo/curvature assumptions are violated by SSIM and score-matching gradients; convergence diagnostics and failure modes are absent.
    6. Narrow, Synthetic OOD Testing – OOD robustness is evaluated only on contrast and resolution shifts; no experiments on real clinical distribution shifts (e.g., pathology, scanner vendor, sequence), weakening the generalisation claim.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Please revise the last sentence that start with ‘Howvere, we hypothesize …‘in P2 for clarity. As written, the logical connection between “unconditional diffusion models for inverse problems” and “hallucination reduction” remains ambiguous. I recommend explicitly stating why the combination of domain‐specific priors and data-consistency optimization reduces both intrinsic and extrinsic hallucinations, and (if relevant) distinguishing this role from that of conditional models.

    The motivation of using DPS is not cleat enough. The authors could make this motivation more explicit—e.g. by stating up-front that DPS already offers the right probabilistic machinery for hallucination reduction but is too slow and brittle for clinical use, hence their modifications.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A new idea with good empirical gains, but needs tighter technical backing. The paper introduces a simple but clever two-step pipeline—first run any existing IQT network, then clean it up with a faster, data-consistent diffusion process. This cuts inference time a lot and clearly reduces fake details (“hallucinations”) in both test and real scans. Still, key parts (toy forward model, hand-tuned loss weights, missing DCATS details) are not fully explained or theoretically proven, so it is hard to judge how well the method will generalize.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This work specifically aims to tackle hallucinations in MRI enhancement, proposing a novel framework DynamicDPS to mitigate both intrinsic an extrinsic hallucination.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper proposes a novel DynamicDPS framework for MRI reconstruction involving the following steps: (1) generate an initial reconstruction from any conditional model, which is subject to hallucination. (2) determine the optimal point to start the reverse diffusion process. (3) Iteratively refine the initial image with a pre-trained diffusion model to reduce hallucination, incorporating strong data prior to mitigate extrinsic hallucination and data consistency to mitigate intrinsic hallucination. This work is well-written, and the method is rational. Apart from the standard image quality metrics, the authors used volume error to measure the hallucination specifically. This is a good work with novel method and significant impact – it raises attention to the problem of hallucination.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    There are a few minor issues to be addressed:

    1. The authors use MRI “IQT” and “reconstruction” interchangeably, which is confusing.
    2. In Equation 1, what does gamma transform do?
    3. In Section 2.4, the authors claim, “the diffusion prior corrects extrinsic hallucinations, ensuring that the reconstructed HF image remains faithful to the measurements.” This conflicts with the earlier definition of extrinsic hallucination. Remaining faithful to the measurements is the problem for intrinsic hallucination.
    4. Regarding the diffusion mode: The images in the course of forward and reverse process are noisy images. The output of conditional model prediction is a clean image with hallucination. These 2 distributions are different - how could you start the reverse process from this x_cond?
    5. I don’t see the point of adding edge and ssim loss in Equation 5. Do you have ablation studies on these terms?
    6. In dynamic step-size optimization, what is the step size referring to?
    7. It seems that the diffusion model is only pretrained on the HCP dataset. Is this single dataset enough to pre-train the diffusion model to learn the “broad” HF MRI distribution?
    8. In Table 1, there is no comparison to SOTA MRI reconstruction models. UNet and ESRGAN are very weak baseline models.
    9. In Table 1, why omit the inference time for UNet and ESRGAN?
    10. In Table 1, why the images from ESRGAN appear to be darker than ground truth?
    11. In Section 3.3, how does volume error explicitly measure hallucination? Need more background.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Please consider to improve the paper based on the suggestions.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Important task and novel method but need more details and clarifications.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors present DynamicDPS, a diffusion-based post-processing framework for enhancing low-field MRI images by correcting spurious hallucinations (common artifacts introduced by conditional super-resolution models). The approach builds upon Diffusion Posterior Sampling (DPS) (Chung et al., 2023), which guides the reverse sampling trajectory of an unconditional diffusion model using an added data-consistency term.

    DynamicDPS extends DPS by introducing a warm-start mechanism, where the reverse diffusion process is initialized with a hallucination-prone enhanced image (e.g., output from a U-Net or ESRGAN). A key innovation is the Data-Consistency-Aware Time Selection (DCATS) module, which determines an optimal reverse diffusion start time toptimal for each sample individually. The method is evaluated on both real and synthetic low-field MRI datasets. It consistently outperforms baseline conditional super-resolution methods in terms of standard image quality metrics (SSIM, PSNR), particularly in out-of-distribution settings.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -Although demonstrated in the context of low-field MRI enhancement, the proposed method is conceptually generalizable and could be applicable to a wide range of generative imaging tasks beyond the biomedical domain.

    -The computational efficiency of the approach is notably improved compared to baseline DPS methods, primarily due to the warm-start initialization. The authors report an approximately 80% reduction in computation time compared to baseline DPS, which significantly enhances the practical feasibility of this method.

    Beyond perceptual metrics (e.g., SSIM, PSNR), the paper evaluates performance on downstream clinical tasks, such as brain tissue volume estimation. The method achieves over 15% reduction in volume error in critical tissues compared to conditional super-resolution baselines.

    The work has the potential to influence future research in MRI enhancement, both in terms of post-hoc correction strategies and as a blueprint for integrating data consistency into generative diffusion pipelines.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • While the method is conceptually sound and technically well-executed, the experimental section lacks completeness, particularly regarding the real-world applicability of the approach. For example, the real-world dataset is only evaluated qualitatively, and no quantitative metrics are reported for this setting. It remains unclear how the method performs when the forward operator A (used in the data consistency loss) is not explicitly defined (which is the case for real world settings).

    • The distinction between synthetic and real data usage in the downstream brain tissue volume estimation task is not clearly explained. If the downstream task evaluation is limited to synthetic data, then the real-world utility of the method remains uncertain.

    • The authors claim the method to be “fine-tuning free,” but this is potentially misleading. In practice, a different unconditional diffusion model would likely need to be trained or fine-tuned per specific use case, as the score function is domain-specific.

    • The DCATS module lacks transparency regarding the reference dataset used for computing likelihood discrepancies. It is not specified whether this reference includes samples from the training set only, or whether real/test image pairs are also involved.

    • Reporting of results could be improved. For instance:

    • Only relative errors are provided for the brain tissue volume task, with no absolute error values or total volume accuracy. This limits comparability to existing MRI enhancement methods.

    • There is no mention of the typical values or distribution of toptimal achieved via DCATS.

    • In terms of perceptual metrics, only SSIM shows significant improvements over the baseline DPS, while gains in others such as PSNR are limited. Importantly, SSIM is also incorporated directly into the data consistency loss used during inference, which introduces a potential metric circularity. This artificially favors the method in terms of SSIM (original DPS has no SSIM in their data consistency loss).

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Maybe a mistake in: Table 1, In-distribution, G, SSIM, .85±.71 is the standard deviation actually that high?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is sound and achieves a significant computational gain compared to the baseline DPS, while also matching or outperforming it in terms of results. The evaluation is strong. However, the authors should provide more details regarding the real-world experiment and the downstream task.

    This work has the potential to make a real impact in the MRI enhancement community and possibly in other applications as well.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank all reviewers for their valuable and constructive feedback. We are encouraged that all recognized the novelty and impact of our framework for mitigating hallucination in medical image reconstruction, an underexplored yet critical challenge. Reviewers found our approach novel (R1–3), intuitive (R1), and backed by strong evaluation (R1–3). Below, we address specific concerns. [R1.7.2,R2.7.5,R3.7.5] SSIM and Edge losses in DC: These losses are heuristic, but they consistently improved anatomical edges and yielded stable gains empirically. While SSIM was used in the evaluation, greater perceptual gains appeared in other metrics (LPIPS: 17.5% vs SSIM: 10.4%, Table 1). We will include our weight selections for these losses in the manuscript. [R.1.7.3, R.3.7.4] Clarify DCTAS implementation: DCATS computes average data likelihoods at each diffusion step using a separate reference dataset and stores in a memory bank. At inference, it performs a lookup-based comparison between the conditional model’s likelihood and the stored values, making the runtime overhead negligible. The temperature τ=0.4 was empirically chosen to balance quality and speed. We will clarify these details in the revised manuscript. [R1.7.4, R2.7.4] Skipping to intermediate diffusion time point: Warm-starting reverse process is common for speeding up sampling. While direct use of conditional outputs can introduce distribution mismatch, we minimize this by selecting the start step via DCATS and adding corresponding forward diffusion noise. Table 1 shows that performance is on par with the full reverse process. [R1.7.1] Simplified forward model: Our design adopts and modifies widely adopted in the IQT literature [Lin et al., MedIA 2023] for controlled benchmarking of resolution, contrast, SNR shifts. In fact, DynamicDPS is agnostic to the specific forward model and could incorporate more physics-based in future work. [R1.7.5] Wolfe’s line search on non-convex objective: Though Wolfe’s conditions are originally for convex objectives, they are widely employed heuristically for non-convex problems. In our implementation, we observed a consistent trend: large step sizes in early iterations where the landscape is smoother, and smaller step sizes near convergence. [R1.7.6] Narrow OOD setting: We agree that exploring other shifts would further validate generalizability. Our current focus was on contrast/resolution shifts and others will be explored further in the future. [R2.7.2] Gamma transform: It simulates reduced tissue contrast in low-field MRI. [R2.7.6] Step size in DynamicDPS: It adjusts how images are updated iteratively, akin to an image-level learning rate. [R2.7.7] Single dataset pre-training: HCP contains a broad range of HF adult brains, which satisfies our current OOD settings on acquisition shifts. [R2.7.8-10] Baseline selection and ESRGAN output: Our goal was to demonstrate hallucination suppression capability, so we used relatively simple conditional models. ESRGAN’s darker outputs result from localized intensity saturation, preserved for fair comparisons. [R2.7.11] Volume estimation for hallucination: Hallucinations often occur in small, low-SNR regions (e.g. thalamus, hippocampus), which significantly impact volume estimates. Thus, we calculate volume error for these tissues as a clinically relevant proxy for hallucination. [R.3.7.1-2] Real data experiment: Quantitative evaluation was omitted due to misalignment between LF and HF scans, but visual quality assessment will be added to Fig 3a with a caution note. While parameters for A were estimated heuristically, DynamicDPS still visibly improved structural fidelity. [R.3.7.3] Fine-tuning free clarification: It means the pre-trained diffusion model (on HCP) is used without retraining during inference, regardless of the conditional model. We will revise the wording to better reflect this. [R.1.10,R.3.7.5] Other minors: We will improve the clarity of these suggestions in our manuscript.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top