Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Multi-modal Magnetic Resonance Imaging (MRI) plays a crucial role in clinical diagnosis by providing complementary anatomy and pathology information. However, incomplete acquisitions remain common due to practical constraints such as cost, scan time and image corruption. Recently, the diffusion model has shown significant potential in the medical image-to-image translation task. However, most existing diffusion-based synthesis models are constrained to fixed input-output modality pairs, lacking the flexibility to handle arbitrary missing scenarios. Furthermore, these approaches inevitably sacrifice anatomical structure consistency and degrade critical texture details during generation, potentially leading to the misdiagnosis of subtle pathological patterns. To address these issues, we propose MISA-LDM, the first many-to-many MRI synthesis framework with modality-invariant structure awareness based on the latent diffusion model. Our approach enables the synthesis of missing modalities within a single model by utilizing any available combinations of modalities. Meanwhile, we introduce a Structure-Preserving Module (SPM) that employs a disentanglement strategy to obtain modality-invariance structural representation and use high-frequency information as a supplement. We use the anatomical priors obtained by SPM to guide the diffusion process, preserving anatomical structure integrity. Extensive experiments conducted on the BraTS2020 and BraTS2021 datasets demonstrate the superiority of our method. The result confirms the necessity of introducing more comprehensive anatomical priors for preserving generation consistency in multi-modal MRI translation. Our code will be released after acceptance. The source code is available at https://github.com/yichen-byte/misa-ldm.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0110_paper.pdf

SharedIt Link: https://rdcu.be/eHwZc

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04984-1_49

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/yichen-byte/misa-ldm

Link to the Dataset(s)

BraTS2020 dataset: https://www.med.upenn.edu/cbica/brats2020/data.html BraTS2021 dataset: https://www.kaggle.com/datasets/dschettler8845/brats-2021-task1

BibTex

@InProceedings{ZhaXin_Structureaware_MICCAI2025,
        author = { Zhang, Xinzhe AND Liang, Junjie AND Cao, Peng AND Yang, Jinzhu AND Zaiane, Osmar R.},
        title = { { Structure-aware MRI Translation: Multi-Modal Latent Diffusion Model with Arbitrary Missing Modalities } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {508 -- 518}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper presents MISA-LDM, a novel framework for multi-modal MRI synthesis using a latent diffusion model. Unlike previous methods limited to fixed input-output modality pairs, MISA-LDM can handle arbitrary missing modality combinations within a single model. It introduces a Structure-Preserving Module (SPM) to ensure anatomical accuracy and preserve critical texture details by using disentangled, modality-invariant structural representations and high-frequency information. Experiments on BraTS2020 and BraTS2021 datasets show that MISA-LDM outperforms existing methods, emphasizing the importance of incorporating rich anatomical priors for consistent and reliable MRI translation.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper is well- structured and organized, with a clear presentation and comprehensive experimental results.
2. The methodology part is technically sound and well-written.
3. The proposed method could outperform SOTA baselines both quantitatively and qualitatively.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Lack of clear motivation. The proposed SPM is designed to enhance the preservation of anatomical structures, which is technically reasonable. However, the manuscript lacks a clear and well-articulated description of rational motivation for the HFCM design, making it appear as an unnecessary or weakly justified addition.
2. Missing technical details. Some important implementation details, particularly concerning the VAE module, are omitted. The manuscript should include specifics such as the model architecture, training hyperparameters, and experimental setup to ensure reproducibility and completeness.
3. Insufficiently rigorous claims. The authors mention that the proposed method is the first to achieve diffusion model-based many-to-many MRI synthesis. Nevertheless, to the best of my knowledge, there are existing and concurrent methods that adopt diffusion models to build unified frameworks for missing modality synthesis [1][2][3]. Moreover, the lack of comparative analysis with these methods weakens the claim of novelty and contribution.
4. Limited ablation evaluation. While the SPM component is introduced to preserve anatomical structural consistency, the manuscript does not provide ablation results to empirically validate its effectiveness. Such evaluation is necessary to demonstrate the value and contribution of this module.
Reference [1] Meng X, Sun K, Xu J, et al. Multi-modal modality-masked diffusion network for brain MRI synthesis with random modality missing[J]. IEEE Transactions on Medical Imaging, 2024. [2] Xiao X, Hu Q V, Wang G. FgC2F-UDiff: Frequency-guided and Coarse-to-fine Unified Diffusion Model for Multi-modality Missing MRI Synthesis[J]. IEEE Transactions on Computational Imaging, 2024. [3] Kebaili A, Lapuyade-Lahorgue J, Vera P, et al. AMM-Diff: Adaptive Multi-Modality Diffusion Network for Missing Modality Imputation[J]. arXiv preprint arXiv:2501.12840, 2025.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

see the weaknesses
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

(1) propose a multi-modal MRI LDM synthesis framework that supports any combination of modalities. (2) introduce a structure-preserving module (SPM) to maintain anatomical consistency. (3) propose a high-frequency compensation module (HFCM) to enhance the detailed texture.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) The model is novel, with the structure preservation module (SPM) and high frequency compensation module (HFCM) to maintain the consistency of anatomical structure and enhance the detailed texture, respectively. (2) Ablation study was performed on each module (SPM, HFCM, SR) to verify the contribution of each part to structure preservation and detail generation.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
(1) The results of the manuscript are somewhat contrary to common sense, which may be caused by unreasonable preprocessing of images. T1CE should be the most challenging sequence to generate, but it achieved the highest PSNR. This may be because the author uses the maximum and minimum intensity for normalization. T1CE often has a very high maximum intensity due to contrast enhancement, which greatly compresses the grayscale value of the non-enhanced area, and correspondingly reduces the difference between the generated image and the real image in the non-enhanced area, resulting in a higher PSNR. (2) Although the manuscript uses two datasets, BraTS2021 is essentially an extension of BraTS2020 and cannot reflect the generalization of the method. It would be better to try other datasets, such as IXI, or even multi-sequence MRI datasets of different organs. (3) The performance of LDM is limited by the reconstruction effect of VAE. It is necessary to report the metrics, such as PSNR, of VAE’s reconstruction of the input sequence, which helps to determine the limits of the model. (4) It would be better to compare the computational complexity of the proposed method and GAN-based models. (5) Lack of recent multi-modal MRI synthesis methods, such as,
- Resvit: residual vision transformers for multimodal medical image synthesis
- An explainable deep framework: towards task-specific fusion for multi-to-one MRI synthesis
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

(1) The method is novel, and the ablation experiment was performed on each module. (2) The results of the manuscript are somewhat contrary to common sense, which may be caused by unreasonable preprocessing of images. (3) Dataset cannot reflect the generalization of the method.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

All raised issues have been resolved.

Review #3

Please describe the contribution of the paper

This paper presents MISA-LDM, a novel latent diffusion framework for multi-modal MRI synthesis under arbitrary missing modality conditions. The proposed method integrates a Structure-Preserving Module (SPM) and a High-Frequency Compensation Module (HFCM) to jointly preserve anatomical consistency and fine-grained texture details.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is clearly written and well-structured. Extensive experiments demonstrate that MISA-LDM achieves superior performance compared to existing methods. The model’s ability to flexibly handle various input-output modality combinations is particularly valuable in real clinical settings.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

While not discussed, the model complexity (multiple encoders, attention modules, and UNet backbone) may challenge deployment in low-resource environments. The ablation study could be further enhanced by discussing the selection of FFMM.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

please see strengths and weakness above
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank all reviewers for their positive comments: novel and well-structured presentation. Q1:model complexity(R1,R3). MISA-LDM introduces relatively higher complexity than GAN-based models. However, the introduced modules lead to significantly higher generation quality and better training stability. Importantly, we adopt a latent diffusion strategy that significantly reduces computational cost. Q2:FFMM selection analysis(R1). Through ablation studies, we found that modality-weighting fusion in FFMM outperforms the simply direct concatenation. Q3:Lack of clear motivation(R2). We appreciate the feedback and added clear motivation. While SPM preserves high-level anatomical structures (e.g., tumor regions) via disentangled representations, it ignores high-frequency details (e.g., tumor boundary), which is crucial for disease grading. To solve it, HFCM explicitly model and enhances edge/texture details via Laplacian operators and adaptive attention, acting as a complementary to SPM’s structure-aware learning. Q4:Missing technical details(R2). Due to page limits, we cannot describe all details. We will make every effort to showcase technical details in camera-ready, and the well-documented code will be publicly released to ensure reproducibility. Q5:Insufficiently rigorous claims(R2). Apologies for the insufficient literature survey and we will revise the claims in camera-ready. Our method differs from existing works by introducing unique SPM combined with HFCM to address clinical requirements of anatomical consistency and high-frequency texture preservation for guaranteeing the structure-aware image translation. Additionally, we employ LDM to reduce computational complexity and introduce modality conditions via noise concatenation. These technical innovations make our approach more practical for real clinical applications. Q6:Limited ablation evaluation(R2). In our ablation study, we conducted tumor segmentation experiments to validate the perversion of anatomical structure. Table 3’s average DICE scores (0.894→0.915 for T1, 0.812→0.846 for T1CE with SPM) have validated its anatomical preservation capability. Due to page limits, we omitted the discussion in ablation analysis and will be incorporated into camera-ready. Q7:Experimental results(R3). In our experiments, we indeed employed min-max normalization, which helps increase the PSNR score for T1CE by compressing intensity differences in non-enhanced regions. However, Table 2 reveals significantly lower Dice scores of tumor segmentation for synthesized T1CE compared to other modalities, confirming persistent challenges in reconstructing pathological structures. This highlights the limitations of intensity-based metrics (e.g., PSNR) in clinical scenarios. The argument of our study is that for medical image synthesis, the quality evaluation should not only focus on pixel-level metrics like PSNR but also the preservation of critical regions (i.e., tumor structures), which is crucial for downstream task analysis. Q8:Dataset generalization(R3). BraTS2021 introduces 882 more cases than BraTS2020 with diverse pathology and acquisition variations. Our experiments show models trained on one dataset perform poorly on the other one, indicating the distributional discrepancy and sample heterogeneity. As for IXI, it comprises only healthy subjects without segmentation masks, making it unsuitable for our structure-preserving training. Q9:VAE reconstruction limits(R3). Our VAE achieves PSNR > 40 dB and SSIM > 0.98 on both BraTS2020 and BraTS2021 datasets for MRI reconstruction, demonstrating exceptional reconstruction performance and providing a robust latent space for the subsequent diffusion process. Actually, the primary challenge in our task lies in cross-modality image generation rather than the reconstruction task. How to introduce the structural prior into the diffusion is important. Q10:Lack citations(R3). Thank you for the references, which are now included in current craft.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

Two reviewers point that this paper has limited ablation evaluation and BraTS2021 cannot reflect the generalization. Mainwhile, it is improper to use min-max normalization on these data.

back to top

Structure-aware MRI Translation: Multi-Modal Latent Diffusion Model with Arbitrary Missing Modalities

Author(s):