Abstract

Multi-modality magnetic resonance imaging (MRI) is widely used in the clinical diagnosis of brain tumors. However, the issue of missing modalities is frequently encountered in the real-world setting and can lead to the collapse of deep-learning-based automatic diagnosis algorithms that rely on full-modality images. To address this challenge, we propose a unified model capable of synthesizing missing modalities through any subsets of the full-modality images. Our method is a sequence-to-sequence prediction model that predicts the missing images by inter-modality correlation and modality-specific semantics. Specifically, we develop a dual-branch encoder, where both branches encode partially masked image tokens into low-dimensional features independently. A decoder then generates the target input images based on the fused encoder features. To strengthen the representative ability of encoder features, we propose a combination loss to improve the discriminative and consistency between diverse modality features. We evaluate our method on the BraTS 2023 dataset. Extensive quantitative and qualitative experiments demonstrate the high fidelity and utility of the synthesized images.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1980_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{QiLia_AUnified_MICCAI2025,
        author = { Qi, Liangce and Liu, Yusi and Li, Yuqin and Shi, Weili and Feng, Guanyuan and Jiang, Zhengang},
        title = { { A Unified Missing Modality Imputation Model with Inter-Modality Contrastive and Consistent Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {44 -- 53}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a unified missing modality imputation method consisting of a dual-branch encoder and a decoder, which is trained by a combined loss of contrastive loss, consistent loss, and reconstruction loss. The experimental results on BraTS 2023 dataset verify the effectiveness of proposed method.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The idea of both masking image modalities and spatial patches is quite interesting.
    2. The experimental comparison is comprehensive, including the most current methods.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. My major concern is the methodology. This paper proposes a transformer-based synthesis method. However, there has been more recent diffusion-based medical image imputation method published, achieving promising results. My question is, what is the advantages of proposed method over diffusion-model-based methodology? Besides, there is no diffusion-based methods included in experimental section.

    2. My another concern is the motivation of network design. Section 2.1 introduces detailed network architectures of proposed method, but the underlying design principles are not fully explained.

    3. Similar problem exists in Section 2.2, and what is the motivation of introducing the inter-modality contrastive learning and consistent learning? In fact, it seems to me that the design lacks a certain level of intuitiveness and rationality.

    4. In Section 3, there are only 6 to 10 slices near tumors sampled from each 3D volume constructing train, validation and test dataset in experiments. Relying solely on a subset of 2D slices for testing might not adequately demonstrate the effectiveness of the proposed approach. It would be beneficial to include all 2D slices from the 3D volume in the experiments and then assess the accuracy on the complete 3D volume.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The methodology could be improved, the motivation is unclear, and the experimental setup could be improved.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposed a model to generate missing MRI of any modalities within a subset of multi-modal images. It includes an encoder for feature learning and decoder for MRI generation. During feature learning, the author introduced sequential masking to learn inter-modality correlation and spatial masking to learn image semantics. They also proposed a combination loss to improve encoder feature representation. Their method was evaluated on BraTS 2023 dataset.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method is well motivated. Introducing sequential masking and spatial masking may force the model to learn inter-modal relationships and missing anatomical features. Unfortuantely, there is a lack of solid validation of their idea;

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. It is particularly worrying that in the inter-modality contrastive learning phase, treating features of the same patch from different modalities as negative samples may suppress the encoder from learning common features across modalities, which may be complementary and beneficial;
    2. lack of ablation. The author did not study individual roles of sequential masking and spatial masking during feature learning. For instance, the author could 1) remove sequential masking, 2) remove spacing masking, 3) remove both. The author did not study the effectiveness of their contrastive learning loss as well.
    3. Inconsistent comparison setting. Why the method ‘M^3’ is missing in Table 1 and the method ‘Zhang’ is missing in Table 2?
    4. I would suggest the author adding a discussion on how misalignment among modalities could affect the generation results, since the proposed framework introduces pixel level masking, which is greatly affected by misalignment in clinical cases.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is overall well motivated, however, I have a major concern on how they design the contrastive learning strategy and there is a lack of important validations on its code technical components.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The author addressed most of my concerns in the rebuttal process, however, I believe a more comprehensive ablation study on their core components, sequential masking and spatial masking, could strengthen this work’s impact.



Review #3

  • Please describe the contribution of the paper

    The authors propose a self-attention-based model for multi-modality MRI synthesis, capable of reconstructing missing modalities from any given subset of available images. The core idea lies in a dual-branch encoder and a single decoder architecture, where each branch independently encodes partially masked tokens into low-dimensional representations. These tokens are then fused to reconstruct the target modality. The model is trained using a novel combined loss based on contrastive and consistency losses to enhance modality-specific and inter-modality feature learning.

    Experimental evaluations conducted on the BraTS 2023 dataset demonstrate that the proposed method led to improved gains in both quantitative and qualitative assessments. Specifically, as the number of available input modalities increases, the model outperforms existing methods with up to +1.3 dB improvement in PSNR and +0.06 gain in SSIM across most configurations. Furthermore, in tumour segmentation tasks, the synthesised images from the model lead to improvement in Dice score over previous state-of-the-art methods. When three modalities are missing, the model outperforms the best baseline by up to +1.8% (WT), +3–4% (TC), and +4.4% (ET). As more modalities become available, performance steadily improves.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Unified and Flexible Imputation Framework: The authors propose a novel imputation model based on self-attention that can synthesise any missing MRI modality from any subset of available modalities. This can open opportunities in clinical workflows, where missing modalities can occur, such as prostate cancer.
    • Comprehensive Evaluation: The experimental evaluation is promising. The model is evaluated on the public BraTS 2023 dataset with a systematic analysis across 0 to 3 missing modalities. It outperforms state-of-the-art baselines (MMT and Zhang) in terms of PSNR, SSIM, and (MMT and M³) in Dice scores.
    • Clinical Relevance: By demonstrating improved performance even when only one or two modalities are available, the model shows clear promise for real-world deployment.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Scalability Beyond BraTS Dataset: All evaluations are based on the BraTS 2023 dataset. To evaluate the generalization capabilities of the proposed method to more diverse clinical settings, additional public datasets—such as MyoPS2020 for cardiac imaging, or other brain MRI datasets like ISLES and IXI—would have been beneficial, especially to strengthened claims across anatomical regions and acquisition protocols.
    • Modality Aggregated Comparison: While the paper includes different quantitative results, the results can be aggregated/summarised by the number of missing modalities (e.g., 1, 2, or 3 missing inputs). This would enable clearer, more direct comparisons between baselines under equivalent sparsity levels.
    • Alternative Baselines: The paper primarily compares its method to two baselines—Zhang et al. and MMT for reconstruction, and MMT and M³ for segmentation. However, additional publicly available and recently proposed methods could have been considered to strengthen the evaluation. For example, Multi-UNet (Xu et al., MIDL 2024) [https://github.com/WenTXuL/MultiUnet], which focuses on joint learning from heterogeneous MRI datasets across brain diseases and modalities, and PASSION (Shi et al., ACM MM 2024) [https://github.com/Jun-Jie-Shi/PASSION], which addresses segmentation with imbalanced missing modality patterns.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel and flexible self-attention-based framework for modality-agnostic MRI synthesis. The model relies on a dual-branch encoder and an inter-modality consistency loss. These technical contributions are effectively demonstrated through different experiment results. A major strength lies in the evaluation across both reconstruction and segmentation tasks on the BraTS 2023 dataset, with demonstrated improved performance over baselines and under varying levels of missing modalities. Additionally, class-wise results (WT, TC, ET) further support the method’s robustness and clinical relevance. However, the paper’s impact could be significantly improved by addressing the mentioned points. The baseline comparison is somewhat narrow given the availability of other recent public methods (e.g., Multi-UNet, PASSION), and the evaluation could benefit from grouping results by missing modality count to enable fairer comparisons. Furthermore, generalisation beyond the BraTS dataset is not explored, and no external datasets (e.g., MyoPS2020, ISLES, IXI) are used to assess robustness across anatomy or imaging protocols.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #4

  • Please describe the contribution of the paper

    This paper introduces a novel unified model for missing modality imputation in multi-modal MRI, aimed at addressing the common issue of missing modalities in clinical brain tumor diagnosis. The model leverages inter-modality contrastive and consistent learning strategies to synthesize missing modality images using any subset of the available modalities. It proposes a dual-branch encoder and single-decoder architecture with a self-attention mechanism, improving the fidelity and utility of synthesized images for tumor segmentation tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -Proposed a unified framework for missing modality imputation, addressing the limitations of prior pairwise or specialized models. -Combined contrastive learning (modality discriminability) and consistent learning (anatomical alignment) in a single loss function. -Validated the method on the challenging BraTS2023 dataset, demonstrating superior performance in both image synthesis (PSNR/SSIM) and downstream segmentation (Dice scores).

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    -Performance degrades significantly when only one modality is available. -Temperature scaling (t=?) and projector architectures are unspecified. -The ablation experiments were inadequate and only compared models with no consistent loss.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite few weaknesses, overall the paper provides a valuable and interesting advancement in this domain.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    My concern has been addressed.




Author Feedback

We thank the reviewers for their time and insightful comments, and the suggestions about volumetric validation, additional datasets and methods, discussion about misalignment, and the modality aggregated comparison will be explored in our future work.

Motivation: For R1, generating missing modalities to address the missing modalities issue can be divided into regression-based models and generative models. While generative models achieve better synthesis quality due to their ability to learn complex data distributions, they may introduce artificial noise and cause misalignment among modalities by using multiple generators. In contrast, unified regression models simplify implementation while preserving anatomical consistency across modalities. Our method leverages semantics not only across but also within diverse modalities to predict missing ones, which is reflected in the dual-branch design. Each modality features belonging to two branches should contain both modality-specific and -invariant contents. The former manifests as discriminative, while the latter is derived from our observation of stable inter-modality difference patterns. Considering the feature relationships within each branch should maintain similar correlations, we construct positive and negative sample pairs from different branches and propose contrastive and consistency learning. Since the features from both branches should be complementary, we introduce a fusion module to integrate these features.

For R3 and R4, the positive/negative pairs in contrastive learning are entangled, with temperature controlling their coupling/repulsion balance. The convergence of the two contrastive losses is critical to our model’s performance. We tested values ranging from 0.1 to 0.95. Lower values result in a sharper feature space, while higher values produce smoother one. We found that t = 0.8 yields the best results. Using only one contrastive loss would require retraining with adjusted parameters (primarily temperature and feature dimensions). Thus, we only compared models with and without contrastive losses while keeping other parameters fixed. For R3, we will add details about the temperature settings. The projection network is a single MLP layer.

Experimental Setup: Due to the need to evaluate both generation and segmentation and constrained by manuscript length, we employed a single dataset. For R2, while we acknowledge that using a single dataset and three baselines is narrow, BraTS 2023 is a well-established benchmark, and the selected baselines are recent works published in TMI and MICCAI. These methods are solid and share similar motivations or structures with ours and cover three current approach categories for modality missing issue. Also for R1, we have considered utilizing the diffusion-based model [DOI: 10.1109/TMI.2024.3368664] for comparison, but we chose Zhang’s as they competitive performance but is easier to implement. More importantly, our experiments are comparable in scale to prior MICCAI studies [18]. For R4, Zhang’s method achieves strong generation results, but in our experiments, segmentation performance suffers due to misalignment among modalities. Thus, we replaced it with M3, which demonstrates superior segmentation performance but does not generate images.

Regarding R4 about masking ablation: removing either spatial or sequential masking allows one branch to access the entire input information, resulting in the model collapsing into identity mapping.

Experimental results for R3: Single-modality input provides insufficient information to obtain inter-modality relationships, especially given the large distribution gaps among the four MRI modalities. Zhang’s utilizes four GANs for different modality generation tasks and is less constrained by single-modality inputs.

Reproducibility: We have stated at the end of the introduction section that all code will be released.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This work has three positive reviewers and a negative reviewers. After checking the comments and rebuttals, I agree with positive reviewers to accept this work. The authors are suggested to leverage reviewer comments to revise the paper for submitting final version.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top