Abstract

Early and accurate diagnosis of Alzheimer’s disease (AD) is crucial for effective treatment and patient care. In clinical practice, physicians can achieve precise diagnoses through the integration of multimodal image information, and it is desired to develop automated diagnosis approaches based on the multimodal information. However, existing multimodal deep learning methods face a critical paradox: although models excel at leveraging joint features to improve task performance, they often neglect the optimization of independent representation capabilities for uni-modal. This shortcoming, known as Modality Laziness, stems from imbalanced modality contributions within conventional joint training frameworks, where models predominantly rely on dominant modalities and neglect to learn weaker ones. To address this challenge, we propose UniCross, a novel balanced multimodal learning paradigm. Specifically, UniCross employs separate learning pathways with specialized training objectives for each modality to ensure comprehensive uni-modal feature learning. In addition, we design a Metadata Weighted Contrastive Loss (MWCL) to facilitate effective cross-modal information interaction. The MWCL leverages patient metadata (e.g., age, gender, and years of education) to adaptively calibrate both cross-modal and intra-modal feature distances between individuals. We validated our approach through extensive experiments on the ADNI dataset, using structural MRI and FDG-PET modalities for AD diagnosis and mild cognitive impairment (MCI) conversion prediction tasks. The results demonstrate that UniCross not only achieves state-of-the-art overall performance, but also significantly improves the diagnosis performance when only a single modality is available. Our code is available at https://github.com/Alita-song/UniCross

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2409_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Alita-song/UniCross

Link to the Dataset(s)

ADNI dataset: https://adni.loni.usc.edu/

BibTex

@InProceedings{YinLis_UniCross_MICCAI2025,
        author = { Yin, Lisong and Ye, Chuyang and Liu, Tiantian and Wu, Jinglong and Yan, Tianyi},
        title = { { UniCross: Balanced Multimodal Learning for Alzheimer’s Disease Diagnosis by Uni-modal Separation and Metadata-guided Cross-modal Interaction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {644 -- 654}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper aims to address the modality laziness problem in multi-modal diagnosis of AD and MCI. The proposed UniCross model integrates patient metadata, e.g., age, gender, and education, to calibrate both cross- and intra-modality feature spaces. Results demonstrate improved performance compared to other model configurations.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work presents a novel approach to enhancing the fusion of multi-modal features for improved disease diagnosis.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Major Points

    1. Lack of Justification for Use Case Selection The paper did not provide a strong rationale for choosing AD and MCI as the target use case. It appears to follow a common pattern of borrowing a concept from the ML/CV community and applying it to a clinical problem without sufficient clinical motivation. The authors should clearly justify why AD was selected specifically, how multi-modal AD diagnosis is affected by the proposed issue (e.g., modality laziness), and why this issue is more relevant to AD than to other diseases.

    2. Unfair Comparison with State-of-the-Art Methods The comparison with 6 existing methods lacks rigor. Four of the selected baselines are not specific to AD diagnosis, and DiaMond is not a peer-reviewed method, which undermines the credibility of the comparison. Moreover, a critical point is that it is unclear whether these baseline methods also incorporate patient metadata during training. If not, the inclusion of metadata in UniCross introduces a confounding variable, making the comparison invalid.

    3. Lack of Statistical Analysis The results did not include any statistical analysis to support the observed performance differences. Without reporting metrics such as p-values, it is difficult to assess whether the improvements are statistically significant. For instance, SSFTT demonstrates similar mean performance but with a much lower standard deviation, suggesting greater variance in UniCross across the five folds. This warrants further investigation and statistical testing.

    Minor Points

    1. Introduction – “It poses severe challenges to healthcare systems worldwide, particularly in developing countries and regions.” Please provide references to support this statement.

    2. Introduction – “Recent studies [10,28] have shown that while benefiting from cross-modal interactions…” This claim is misleading, as the cited studies are not specific to AD diagnosis at all. Do you have evidence that such cross-modal phenomena are relevant to AD or MCI?

    3. Introduction – “Several works have attempted to address modality laziness.” None of the cited works focus specifically on AD or MCI diagnosis. The assumption that modality laziness is a critical issue in this context is questionable and should be better supported.

    4. Ablation Study Please provide a clear rationale for selecting the specific loss functions used in the ablation study (e.g., CLIP, SupConLoss). Additionally, do these variants also incorporate patient metadata? Clarifying this is important for the fairness of comparisons.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper failed to provide a strong justification for the chosen use case. There is no fair comparison with state-of-the-art methods, and the absence of statistical analysis makes it difficult to determine whether the observed differences are statistically significant.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    Naturally, different modalities do not contribute equally in terms of diagnostic information and may exhibit varying performance when used individually. However, this does not indicate ‘modality laziness.’ The discussion in Section 3.4 is ambiguous. A decrease in the performance of one modality does not necessarily imply over-reliance on the other. Moreover, degraded single-modality performance does not always result in lower multi-modal performance. For example, the Concat method reduced the performance of both sMRI and PET individually, yet improved the overall multi-modal performance compared to other methods (Sum, FiLM, and Gated), which primarily improved only one modality. How to explain this? Does it suggest that Concat is equally ‘lazy’ or ‘non-lazy’ across both modalities?

    I will maintain my original assessment and recommend rejection.



Review #2

  • Please describe the contribution of the paper

    The authors try to address modality laziness, which means that a classifier mostly depends on a single modality to make a prediction even though other modalities are available. The authors propose a specific loss function that takes demographic factors into account.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The text is generally clear and does not seem to contain spelling/grammar mistakes. The authors’ experiments are thorough, and they use state-of-the-art training methods, such as data augmentation. Moreover, the authors compare their own loss with state-of-the-art losses and with various ways to combine multimodal representations. The thoroughness of their ablation study should be commended. I will also commend the authors for making their code publicly available.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The following are limitations that, if addressed, can greatly improve the quality and impact of the presented work. These limitations are not presented in a specific order. 1) The authors assume (at the end of page 4) that patients with similar demographic factors have similar pathological patterns. I believe the paper would be strenghtened if the authors either provide a reference for this claim or verify it empirically. 2) The way the MCWL is explained in the introduction makes it sound like it is used to modulate the predictions, which is not true if it is a contrastive loss (no modulation during inference). 3) The authors should check the weights of their multimodal layer to see if they actually use both modalities or if the model ends up learning the same features from both modalities Otherwise it is unclear whether the authors have improved over modality laziness. 4) The authors should provide more information on the selection of hyperparameters

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the paper is good, but I would like the authors to argue more convincingly that modality laziness has improved, not just with classification performance, but also using some type of model introspection.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposed UniCross, a novel balanced multimodal learning paradigm which can solve the modality laziness.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This idea is reasonable to me. The separate training will solve the modality laziness problem, and the proposed MWCL will help the feature gathering. The loss designed follows the intuition.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    First, how to demonstrate that the simply combination of multiple loss be effective? Second, why there is this clarification: reformulation sacrifices effective cross-modal information interaction.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    I will consider rescore if authors can solve my concerns.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This method is reasonable to me. However, still some concerns need to be solved. Please refer to the weakness part.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    First, i think the idea of the paper is novel enough for publication, but I’m not satisfied with the author’s feedback, on how to efficiently combining different loss for training. Second, I believe the Reviewer #1 has proposed some critical ideas. Although I put accept on this paper, I hope ACs and PCs can read through this paper to give a fair result.




Author Feedback

R1:

A1: AD is a complex neurodegenerative disorder, and recent guidelines (e.g., NIA-AA) recommend integrating multiple biomarkers. However, not all modalities contribute equally; PET often dominates over sMRI [Is a PET all you need? 2022 MICCAI]. Conventional multimodal frameworks tend to over-rely on the dominant modality, leading to “modality laziness”—a phenomenon we explicitly demonstrate exists in AD diagnosis in Section 3.4. Our UniCross framework improves both overall and unimodal results, which is especially valuable in clinical settings where some modalities may be missing. We believe our framework is general and can be extended to multimodal diagnosis of other diseases.

A2: (1) The DiaMond was published at the 2025 WACV, and we will update the citation. (2) While not all baselines were originally designed for AD, these approaches represent the state-of-the-art in multimodal fusion [1,25] and balanced multimodal learning [22,17]; notably, the former have been widely used as baselines in AD diagnosis studies[22]. Our goal was to provide a comprehensive evaluation by including both established fusion strategies and recent balanced learning paradigms, offering meaningful references for readers. (3) Importantly, our method does not use metadata during inference, and it is challenging to integrate metadata into baseline frameworks. In our study, only three variables(age, sex, education) are integrated into our framework via the MWCL, which is a major contribution of our work.

A3: Thank you for pointing out the importance of statistical validation. We agree that statistical analysis is essential to robustly support our findings.

About Minor Points: (1) Due to space limitations, we had previously omitted some references, but we will update the manuscript to include these important citations if allowed. (2) References [10] and [28] are general studies on multimodal learning. As mentioned in A1, we provide evidence that such cross-modal phenomena are relevant to AD or MCI. (3) As discussed in A2(2) and A1, modality laziness indeed exists in this context, and it is particularly important for medical disease diagnosis. (4) We used CLIP and SupConLoss because they are classic loss functions for contrastive learning. Although they do not incorporate metadata, as mentioned in A2(3), integrating metadata is a major contribution of our work.

R2:

A1: Thank you for your suggestion. Due to space limitations, we had previously omitted some references, but we will update the manuscript to include citations supporting the association between demographic factors and pathological patterns if allowed.

A2: You are correct—MWCL is a contrastive loss and does not affect inference. We will clarify this.

A3: Thank you for this insightful comment. While improved unimodal performance suggests reduced modality laziness, your suggestion to directly analyze multimodal weights is valuable. As adding new results is not permitted at this stage, we will address this in future work.

A4: All hyperparameters follow prior work (e.g., contrastive loss temperature τ=0.07). We will consider systematic tuning in the future.

R3:

A1: The ablation study in Table 2 clearly demonstrates that removing either the shared head or MWCL decreases performance, confirming each component’s contribution. Each loss serves a different purpose: Luni strengthens unimodal representations, Lsp facilitates defines a common feature space, and MWCL guides cross-modal interaction using metadata.

A2: This clarification refers to the limitations of recent multimodal alternating learning approaches. Methods like [31,9] reduce modality laziness by reformulating the joint training framework but sacrifice synchronous cross-modal interaction by not processing multiple modalities simultaneously. UniCross addresses this limitation by maintaining cross-modal interaction through the shared head and MWCL while solving modality laziness via separate learning pathways.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    While the paper presents an interesting application of cross-modal learning to AD and MCI diagnosis, major concerns remain unresolved. Specifically, the lack of a strong clinical justification for the chosen use case, limited and potentially unfair comparisons with state-of-the-art methods, absence of statistical analysis, and ambiguity in interpreting modality contributions weaken the overall contribution. Given these issues, I recommend rejection at this stage.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top