Abstract

Magnetic Resonance Imaging (MRI) reconstruction is essential in medical diagnostics. As the latest generative models, diffusion models (DMs) have struggled to produce high-fidelity images due to their stochastic nature in image domains. Latent diffusion models (LDMs) yield both compact and detailed prior knowledge in latent domains, which could effectively guide the model towards more effective learning of the original data distribution. Inspired by this, we propose Multi-domain Diffusion Prior Guidance (MDPG) provided by pre-trained LDMs to enhance data consistency in MRI reconstruction tasks. Specifically, we first construct a Visual-Mamba-based backbone, which enables efficient encoding and reconstruction of under-sampled images. Then pre-trained LDMs are integrated to provide conditional priors in both latent and image domains. A novel Latent Guided Attention (LGA) is proposed for efficient fusion in multi-level latent domains. Simultaneously, to effectively utilize a prior in both the k-space and image domain, under-sampled images are fused with generated full-sampled images by the Dual-domain Fusion Branch (DFB) for self-adaption guidance. Lastly, to further enhance the data consistency, we propose a k-space regularization strategy based on the non-auto-calibration signal (NACS) set. Extensive experiments on two public MRI datasets fully demonstrate the effectiveness of the proposed methodology. The code is available at https://github.com/Zolento/MDPG.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2241_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Zolento/MDPG

Link to the Dataset(s)

FastMRI dataset: https://fastmri.med.nyu.edu/ IXI dataset: https://brain-development.org/ixi-dataset/

BibTex

@InProceedings{ZhaLin_MDPG_MICCAI2025,
        author = { Zhang, Lingtong and Song, Mengdie and Hao, Xiaohan and Mai, Huayu and Qiu, Bensheng},
        title = { { MDPG: Multi-domain Diffusion Prior Guidance for MRI Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {344 -- 353}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces a framework for MRI reconstruction that leverages multi-domain diffusion prior guidance. The approach involves a two-stage design where prior knowledge is utilized to guide the Visual-Mamba backbone. Experiments conducted on two datasets demonstrate the framework’s effectiveness.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The two-stage strategy is technically sound and effective. It involves initially training a diffusion model for “coarse reconstruction” followed by training a VMamba-based model for “refinement.”

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The writing lacks clarity. For instance, the method section does not clearly define the input and output of each module. The logical flow is disjointed, making it difficult to understand the function of each module.
    • The design appears to be a modular assembly of existing works rather than an innovative approach.
    • The term “multi-domain” is not well-explained. It is mentioned only in the abstract, introduction, and conclusion, leaving its meaning unclear in the method section. Additionally, the term “multi-domain fusion strategy” is only mentioned in the conclusion.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • The writing should be refined.

    • Some key issues mentioned need to be addressed.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See weaknesses.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper
    1. The authors proposed Multi-domain Diffusion Prior Guidance (MDPG). A novel Visual-Mamba-based backbone is utilized to process the measurements, i.e., the k-space of under-sampled images.
    2. Latent Guided Attention (LGA) and Dual-domain Fusion Branch (DFB) modules were designed to fully use prior knowledge guidance from different domains.
    3. A k-space regularization based on the non-auto-calibration signal (NACS) set was introduced to help enhance data consistency.
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A large framework integrating the information from three domains, i.e., latent domain, image domain and frequency domain, and several advanced backbones, i.e., Diffusion, Mamba, and Attention, is proposed for MRI reconstruction.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Some detailed designs are unclear and not well verified in ablation study.

    1. Why should the pre-trained encoder and decoder be used? How could they be pretrained? Is there any special requirement for training?
    2. As indicated by the authors, LDMs achieve the balance between generative quality and efficiency. I wonder if the DMs outperform LDMs without considering efficiency. The abation study did not compare the effectiveness of LDMs vs. DMs.
    3. The encoder adopts VMamba while decoder uses convolutions. Why use this design? Does this design boost performance? I cannot find relative ablation study.
    4. For DFB, I wonder if the frequency branch is really necessary and could boost performance. I cannot find relative ablation study.
    5. In Table 3, I feel confused if N/A and X are distinguished? What about the difference between A and E?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Some detailed designs are unclear and not well verified in ablation study.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Although this paper could be accepted, the defect of this paper is still obvious. The authors adopted the pre-trained encoder and decoder in their design, but I cannot find the resonablity illustration of this design in both the original paper and the authors’ feedback.



Review #3

  • Please describe the contribution of the paper

    The authors propose a two-stage framework named Multi-domain Diffusion Prior Guidance (MDPG), which leverages pre-trained latent diffusion models to incorporate prior knowledge from both latent and image domains. A Visual-Mamba-based backbone is introduced to process under-sampled k-space measurements efficiently. To integrate the multi-domain priors, the authors design two modules: Latent Guided Attention (LGA) for latent-level fusion and Dual-domain Fusion Branch (DFB) for combining image- and k-space information. Furthermore, a k-space regularization method based on the non-auto-calibration signal (NACS) is introduced to enhance data consistency by emphasizing structurally important regions.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper presents a novel framework for MRI reconstruction.

    1. This paper utilizes the learned multi-domain priors from LDMs to enhance the Visual-Mamba-based backbone network in reconstructing under-sampled MR images.
    2. To achieve effective feature fusion and guidance across different domains, this paper design two dedicated modules: Latent Guided Attention (LGA) and Dual-domain Fusion Branch (DFB).
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The authors have explained the motivation for introducing multi-domain priors from LDMs. However, the paper lacks a clear explanation of the motivation for adopting the Visual-Mamba-based backbone as the reconstruction network. In addition, there is no review or citation of related works on Visual-Mamba-based networks. 2.The performance improvement of the proposed method is limited on most reconstruction results, while the overall network design becomes more complex.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper introduce the learned multi-domain priors from LDMs into existing deep learning-based network architectures for MRI reconstruction. Meanwhile, the authors also design a Visual-Mamba-based backbone network, which may further contribute to the improvement of reconstruction accuracy.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors’ rebuttal addresses my concerns well.




Author Feedback

We sincerely thank the reviewers for their constructive feedback. All reviewers recognized the validity of the proposed method. We have carefully considered the main concerns regarding motivation, novelty, and methodological details.

Common issues 1) Motivation of VMamba UNet’s design We selected the VMamba backbone for its efficiency in long-range dependency modeling, which is critical for capturing global anatomical patterns in MRI reconstruction. This capability aligns seamlessly with the proposed Latent Guided Attention (LGA), as both prioritize integrating global latent priors from the latent domain. For the decoder, we adopt a plain convolutional design to recover fine-grained details while minimizing computational overhead, which provides sufficient precision. Additionally, our design is inspired by recent advancements in VMamba UNet architectures for tasks like medical image segmentation and remote sensing.

2) The design lacks of novelty and increases complexity We sincerely thank the reviewer for raising this concern. As the related works shows, introducing the prior guidance into vision tasks is novel, thus our method is beyond a modular assembly. The added complexity is proved to be necessary, as the related work shows. Meanwhile, sufficient ablation proves the proposed modules are reasonably designed. Even though the improvement may seem limited, in medical image reconstruction, small improvements may have clinical significance, as demonstrated in Fig. 2.

To R1: 1) Questions about pre-trained encoder-decoder We followed the official code and trained an LDM-16 model until it was converged on dataset we conducted experiments on. We reused its encoder as the condition encoder. There’s no more special requirement. 2) The ablation did not show the comparison between LDM and DM Thanks for the suggestion. As mentioned in the abstract, “Latent diffusion models (LDMs) yield both compact and detailed prior knowledge in latent domains, which could effectively guide the model towards more effective learning of the original data distribution”. The main goal of this work focuses on the effective using of the prior from LDM in multi-domains, while DMs do not follow this premise. Even so, adding DM is a good choice to complete the whole experiment. 3) Design of VMamba UNet Please see Common issues 1. 4) DFB’s frequency branch not tested in ablation This branch is designed to fill frequency domain (k-space) explicitly. This is dictated by the physics of MRI images, so no more separate discussions of the frequency branching. 5) Question about symbols in ablation N/A refers to not applicable, as line 1 refers to the backbone alone. A means DFB is at the position of pre-fusion before the encoder, and E refers post-fusion after the decoder. This is an extra discussion.

To R2 1) The writing lacks clarity We apologize for misunderstanding we may have caused. Fig.1 shows the inputs and outputs of the method, while the sections introducing the modules are mainly in Section 2.1 and 2.2, both of them are well organized. It is worth noting that Eq.9 through 11 use formal expressions in order to be able to describe the DFB in detail. 2) The design lacks novelty Please see Common issues 2. 3) The submission does not provide sufficient information for reproducibility Thanks for emphasizing this critical aspect. For LDM, please see R1.1. Sufficient information about modules and training parameters has been provided in section 2 and 3 for reproducing. 4) The term “multi-domain” is not well-explained Thanks for pointing out this issue. There are three domains mentioned in this paper: the latent domain provides a compact prior; the image domain preserves spatial details; the frequency domain enhances feature in the original measurement space.

To R3 1) Motivation of VMamba Please see Common issues 1. 2) The performance improvement is limited. Network becomes more complex. Please see Common issues 2.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper proposes an interesting approach by integrating latent and image domain guidance for MRI reconstruction. Although the authors demonstrate promising results, the lack of clarity and the inclusion of numerous complex building blocks without detailed justification diminish the potential impact of the method. Additionally, the absence of standard deviation reports in the quantitative results makes it difficult to assess the true improvement offered by the proposed approach.



back to top