Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Brain disease diagnosis and treatment planning rely on complementary information from multiple MRI modalities. Compared to routine modalities (RM) such as T1, T2, and FLAIR, modalities like DWI and T1ce provide unique diagnostic information but are less commonly used due to longer scan times, higher cost or the need for contrast agents. To mitigate this, multi-modal MRI synthesis methods are proposed to generate advanced MRIs from routine MRIs. However, in clinical practice, missing modality is a known issue in MRI generation which degrades the synthesis quality. Existing methods typically use shared encoders and masking strategies to compensate for missing modality. However, as the number of missing modalities increases, it becomes harder to capture the inter-modal correlations, causing a sharp performance drop. To address this, we propose the Feature Mapping and Merging Diffusion Model (FMM-Diff). Instead of using a shared encoder, we introduce dedicated mapping encoders for each modality. When a modality is missing, its latent representation is inferred from the available ones via its dedicated encoder. This ensures complete latent representations, allowing the Merge Module to selectively extract and fuse inter-modal correlations, significantly improving synthesis performance. Evaluated on two public MRI datasets, including CGGA and BraTS2021, FMM-Diff not only outperforms the state-of-the-art models by 4.35% in terms of Structural Similarity Index Measure (SSIM) while demonstrating exceptional stability, with less than a 1.0% SSIM drop, which is significantly lower than the 2.0–3.45% drop observed with other methods, across various missing modality scenarios.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1278_paper.pdf

SharedIt Link: https://rdcu.be/eHxej

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05325-1_22

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZhoWen_FMMDiff_MICCAI2025,
        author = { Zhong, Wenjin AND Cong, Cong AND Wang, Zihan AND Yan, Zeya AND Di Ieva, Antonio AND Liu, Sidong},
        title = { { FMM-Diff: A Feature Mapping and Merging Diffusion Model for MRI Generation with Missing Modality } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {227 -- 236}
}

Reviews

Review #1

Please describe the contribution of the paper

(1) proposed a novel LDM-based model to synthesize MRI images under arbitrary missing modality combinations. (2) introduces two key modules: Feature Mapping Module (FMM) to infer latent representations of missing modalities, and Multi-Modal Feature Share and Merge Module (MFSM) to perform structure-aware feature fusion using attention mechanisms (3) conduct extensive experiments on CGGA and BraTS2021 datasets, demonstrating that FMM-Diff achieves superior performance compared to existing methods.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) The proposed method is novel, incorporating Feature Mapping Module (FMM) to extract better latent representations and Multi-Modal Feature Share & Merge Module (MFSM) to perform structure-aware feature fusion across modalities. (2) The proposed method performs SOTA on multiple missing modal combinations and has good generalization ability on CGGA and BraTS2021 datasets.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

(1) The manuscript proposes that advanced modality (AM) provides more unique diagnostic information than routine modality (RM), so AM should be synthesized. This seems a bit strange. To reduce cost or scan time, clinicians can keep AM and optimize the RM protocol for better diagnosis and efficiency. It would be better to modify the introduction. (2) The manuscript proposes that FFM can extract a unified latent representation, but lacks quantitative and qualitative analysis. (3) There is a lack of evaluation on downstream tasks or clinical applications, which is critical for evaluating synthetic T1CE. (4) The visualization results are limited, and the images are blurry. It would be better to add a zoomed-in view. (5) Lack of ablation studies on each module of the proposed method. (6) Lack of comparison of computational complexity for the proposed method and SOTA methods. (7) The image grayscale in the visualization results is dark, and the PSNR value is too high (greater than 30). This may be because the preprocessing is affected by the maximum value of the enhanced area in T1CE, resulting in a low overall grayscale. This may cause some distortion in the metrics and fail to reflect the true performance of the model.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The experimental results are not comprehensive enough and lack visualization results, ablation studies on each module, and the evaluation of downstream tasks or clinical applications.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

All raised issues have been resolved.

Review #2

Please describe the contribution of the paper

This paper proposes a FMM-Diff model designed to synthesize advanced MRI modalities, such as DWI and contrast-enhanced MRI, from incomplete modality inputs. This approach addresses a clinical challenge, where the absence of certain imaging modalities in practice limits the use of complete multimodal data for training.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The study utilizes two publicly available datasets to demonstrate the effectiveness of the proposed method.

Ablation study is included to validate the design in model architecture.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The use of element-wise summation for feature integration is somewhat questionable. While results indicate that it is effective to some extent, this integration strategy is a key component of the model and warrants a more thorough consideration. Exploring alternative feature fusion methods and comparing them through ablation studies could provide a stronger foundation for this design choice. If the authors can address my concern, it would strengthen this paper and I will give an extra point from my side.

Figure 3d. If multiple modalities are present, does the model require training a separate encoder Ey for each modality y?

I like the idea of this paper and believe it would have a significant impact if the code could be provided.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed missing modalities generation method is novel.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The author has answered my concerns and I believe this manuscript could have impact to the field.

Review #3

Please describe the contribution of the paper

The paper proposes FMM-Diff, a novel diffusion-based framework for missing modality MRI synthesis. The main contributions lie in the introduction of two key components: (1) the Feature Mapping Module (FMM), which employs dedicated and mapping encoders to infer latent representations of missing modalities from the available ones, and (2) the Multi-Modal Feature Share and Merge (MSFM) module, which aggregates and fuses modality features using a combination of local pooling and cross-modal attention. Together, these modules enable the generation of high-quality MRI modalities even in severely missing input scenarios. The approach is validated on two benchmark datasets (CGGA and BraTS2021), showing superior performance and robustness compared to recent state-of-the-art methods.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The method shows consistent and strong results even when only one input modality is available, demonstrating its practical applicability in real-world clinical situations where data incompleteness is common.

The FMM and MSFM modules are conceptually well-separated and could be reused or extended in future work for other multi-modal synthesis tasks.

Extensive experiments on two publicly available datasets confirm the model’s superiority over several recent state-of-the-art baselines (e.g., FgC2F-UDiff, M2DN, ShaSpec), both quantitatively (PSNR/SSIM) and qualitatively (lesion detail preservation).
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Although PSNR and SSIM are widely adopted metrics for image synthesis, their ability to capture clinically meaningful aspects of medical image quality is limited. Specifically, they may not adequately reflect modality-specific style consistency or anatomical structure preservation. The authors should justify the use of these metrics in the context of missing modality synthesis and consider including evaluations on downstream tasks such as segmentation or diagnosis to better demonstrate the clinical value of the generated images.

The proposed Multi-Modal Feature Share and Merge (MSFM) module integrates GAP, Softmax, and Cross-Attention mechanisms. However, this design appears to be a straightforward combination of common components. The paper would benefit from a clearer explanation of the motivation behind this design, as well as a comparative discussion on why MSFM is more effective than other attention-based or fusion alternatives.

While the ablation study verifies the contribution of the FMM and MSFM modules, it remains rather coarse. In particular, the effectiveness of the mapping encoder 𝑃𝑛 could be better validated by visualizing its reconstructed features or latent representations, and comparing them with ground truth modalities. Additional analyses such as latent space visualization or structural similarity evaluations would help support the claim that 𝑃𝑛 effectively infers missing modality information.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

The method presented in this paper is overly complex. I strongly recommend that the authors open-source their code to enhance reproducibility and to facilitate others in following and building upon their work.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper addresses an important problem of missing modality MRI synthesis with a well-designed diffusion-based framework. The proposed FMM and MSFM modules are effective and achieve strong performance across various missing modality settings. However, the evaluation is limited to PSNR and SSIM, and lacks downstream task validation or deeper analysis of module behavior. Despite these issues, the method is practically useful and shows clear improvements over prior work, which justifies a weak accept.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have satisfactorily addressed the majority of the issues I pointed out. I consider the current version appropriate for publication.

Author Feedback

We thank the reviewers for their insightful comments and for recognizing the novelty and effectiveness of FMM-Diff. Below, we address the major concerns first, followed by responses to minor points.

R2, R3: Effectiveness Analysis of FMM FMM is not intended to extract a unified latent representation. Instead, it is specifically designed to compensate for missing modalities through dedicated mapping encoders 𝑃𝑛. This approach helps prevent the performance degradation often observed in methods using shared encoders. Our preliminary results quantitatively show that the decoder 𝐷𝑛 can effectively reconstruct missing modalities from the latent representations of available ones. As shown in Table 2, removing FMM leads to a notable performance drop, whereas its inclusion ensures stable and reliable results.

R2, R3: Evaluation on Downstream Tasks We acknowledge the importance of validating synthesized images in downstream tasks and plan to expand our study in this direction. However, the current work primarily focuses on introducing the proposed method and evaluating the quality of synthesized MRIs with SSIM and PSNR following prior works [16, 21].

R2, R3: Ablation Studies and Module Behavior Analysis We have already validated both FMM and MSFM through ablation studies in Table 2, Section 3.2. We will incorporate R3’s suggestion to provide deeper analysis of each module’s behavior in the revised version.

R1, R3: Code Availability Code will be release upon acceptance.

R1-1: Comparison with Alternative Fusion Methods in FMM In preliminary experiments, we evaluated various fusion strategies within FMM, including cross-attention, concatenation, and MSFM. However, these approaches did not outperform simple element-wise summation, and significantly increased training time due to curse of dimensionality in the high-dimensional feature space. As a result, we adopted element-wise summation for its simplicity and efficiency.

R1-2: Use of Separate Encoders Yes, a separate encoder Ey is used for each modality.

R2-1: Motivation We interpret R2’s comment as suggesting a shift in focus from generating AM (for enhancing diagnostics) to optimizing RM (for reducing cost or scan time). We respectfully disagree, as this represents a fundamentally different clinical objective beyond the scope of our study.

R2-2: Limited Visualization Results Due to space constraints, image quality was reduced due to compression during figure preparation. In the revised version, we will improve the quality and include additional examples with zoom-in views.

R2-3: Comparison of Computational Complexity We performed comparative analyses on computational costs but omitted them due to space limitations. Notably, our model achieves optimal performance without a significant increase in computational burden. We will include these results in the revised version.

R2-4: Preprocessing May Cause Distortion The dark appearance in Fig.4 is due to compression during figure preparation, not errors in preprocessing and all models were evaluated using a unified preprocessing pipeline to ensure fairness.

R3-1: Design Motivation for MSFM Previous studies [16, 21] commonly employed summation, concatenation, or cross-attention to fuse inter-modality features and directly sent the fused feature into backbone, often overlooking the relative contribution of intra-modality information. In contrast, MSFM applies GAP followed by a Sigmoid function to recalibrate inter-modality features using intra-modality cues. Additionally, we hypothesize that the diffusion backbone requires different conditional inputs at different time steps. Therefore, we use time-step-specific bottleneck features as queries to selectively retrieve relevant inter-modality features for feature fusion. This design allows the model to dynamically adapt to the varying demands throughout the diffusion process, which is validated by our experimental results.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The authors did a good job in the rebuttal. Although I recommend accept for this paper, I encourage the authors to add clinically relevant metrics such segmentation metrics as indirect metrics used in the downstream tasks.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The author addresses the concerns in the rebuttal

back to top

FMM-Diff: A Feature Mapping and Merging Diffusion Model for MRI Generation with Missing Modality

Author(s):