Abstract

Multimodal data holds significant value in the diagnosis of Alzheimer’s disease (AD). However, in real-world applications, factors such as privacy protection, acquisition costs, and sensor failures often lead to data missingness, posing challenges for incomplete multimodal learning. Currently the artificial intelligence-based diagnostic methods for AD on incomplete multimodal data have gained increasing attention. However, existing approaches typically overlook modality distribution discrepancies and suffer from severe performance degradation under recovery paradigms lacking reconstruction experience. To address this challenge, we propose an Adaptive Graph Distribution Consistency Modal Recovery Network Based on Normalizing Flows (AGDiC) to tackle incomplete multimodal learning in neuroimaging. We develop a novel framework integrating adaptive graph learning with normalizing flows and a modality regularization strategy. This framework focuses adaptive graph attention features on modality distributions while ensuring distribution consistency of recovered data, and employs masked cross-attention to facilitate multimodal fusion. Unlike conventional methods, our model can handle arbitrary modality missingness during both training and inference phases without relying on reconstruction experience. Extensive experiments are conducted using three neuroimaging modalities from the ADNI dataset: sMRI, fMRI and PET. Results demonstrate that our method achieves state-of-the-art performance and exhibits remarkable stability across various random missing rates.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0965_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiYaq_Alzheimer’s_MICCAI2025,
        author = { Li, Yaqin and Dong, Yihong and Wu, Yanan and Yan, Haihao and Gao, Linlin},
        title = { { Alzheimer’s Disease Recognition Based on Adaptive Graph Normalization Flow for Incomplete Multimodal Data Fusion } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {65 -- 74}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a method for Alzheimer’s disease detection that relies on multimodal data but can account for any missing modalities during training and inference time. It points out that the 3 significant contributions are: 1. Addressing the recovery paradigm that lacks reconstruction experience, 2. Adding a modality regularizer based on normalizing flows, 3. Modality fusion through mask cross-attention. The results of this method are highlighted and explored on a public dataset called ADNI.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • This paper is really interesting and tackles a highly relevant problem.
    • The method description mathematically is very sound – from my interpretation - I was able to follow this section in detail – well done.
    • The paper is nicely written and well formulated
    • The explanation of the results is good
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • In figure 1, perhaps I am just misunderstanding the visual, but it says that PET imaging is the missing modality, but it looks like there is a PET input. It should be clarified what is meant by missing in this context – is it missing at inference or training (by the sound of the introduction you can accommodate both), maybe cross out the PET imaging in the first column if it’s missing at training. Considering this is the overall figure for the model it makes it pretty difficult to understand what is going on.
    • Related to my previous point, the differentiation between missing modalities at training and inference time should be very clearly explained in the paper. -In the introduction there are two pretty unclear concepts: why multimodal learning - is this the standard for Alzheimers? I think it would be important to even briefly touch on unimodal detection to explain why it’s not sufficient before specifically getting into the weaknesses with current multi-modal approaches. For example, in the literature it seems you can pretty accurately detect it from MRI: https://www.sciencedirect.com/science/article/pii/S1110016822005191. The other confusing element is the concept “reconstruction experience” - it is somewhat defined but the introduction reads like this should be an obvious concept and I don’t feel that it is. What exactly do you mean by reconstruction experience? Does this mean having seen the input and deconstructed and reconstructed like an autoencoder? Overall there are a few places in the introduction that feel like I need to already know this domain intimately to understand. -In table 1 – why exactly were these baseline methods used for comparison? The choices seem a bit unjustified. I’m wondering why the paper doesn’t compare to single modality classification (related to my comment about the introduction). -Also in table 1 – there are a lot of acronyms in the paper to begin with but the columns that essentially compare binary classification of different stages of Alzheimer’s are confusion. I would suggest simplifying HCI to “Health” , “Intermediate”, “Late” or something like that. That’s more of a preference comment but may make it easier for readers to digest.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • Related to my previous answer, the dataset is public so that is provided, there is no mention of the code
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    All of my previous comments are related to the presentation of the work and not the core contribution. I feel that this work is valuable to MICCAI and after addressing these comments, the communication of these core concepts could be strengthened.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper introduces AGDiC, a framework for Alzheimer’s disease diagnosis that handles incomplete multimodal neuroimaging data. By integrating adaptive graph learning, normalizing flows, and masked cross-attention mechanisms, the method enables modality recovery without relying on complete data during training.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    · Focuses on a practical setting where modality data is incomplete during both training and testing—a case often overlooked by existing methods. · The normalizing flow is combined with adaptive graph learning and masked attention to enable flexible modality recovery and robust fusion. · Experiments are conducted under varying missing data rates, with performance reported across multiple tasks and baseline comparisons.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    · The title says this is about Alzheimer’s disease recognition, but the experiments don’t include AD patients—only HC, EMCI, and LMCI. This feels inconsistent, and as a reader it’s confusing. If the model isn’t tested on AD subjects, it’s unclear what “recognition” is referring to. The authors should either include AD in the evaluation or explain clearly why it was left out. · The paper lacks discussion on how the model makes decisions, especially in clinical contexts where interpretability is important. While the architecture is technically detailed, it’s unclear what features or regions drive classification, or whether the model’s decisions align with known AD biomarkers. Adding interpretability analysis or even qualitative results would improve the practical value of the method.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is interesting and shows good results, but the missing AD label and lack of interpretability limit its clarity and impact.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    AGDiC is capable of supporting network training and prediction under arbitrary modality missing conditions, with three main innovations: firstly, it employs ​Adaptive Multimodal Graph Learning for feature extraction, which integrates an inter-layer memory mechanism to capture multi-scale features; secondly, the ​Flow-based Distribution Transfer and Modality Regularizer ensures the recovery of features from all three modalities even when one or two modalities are missing, by enforcing distribution consistency and leveraging regularization constraints; thirdly, the ​Masked Cross-Attention Modal Fusion mechanism enhances the robustness of multi-modal fusion and prediction by selectively focusing on available modalities while dynamically masking missing ones, thereby reducing noise interference and improving classification accuracy.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduces ​AGDiC, a novel framework for AD diagnosis using incomplete multimodal neuroimaging data, addressing the critical limitation of existing methods that depend on complete training data for modality recovery. Central to its innovation is the ​flow-based distribution Transfer and Modality Regularizer, which redefines how missing modalities are reconstructed without requiring prior reconstruction experience. By leveraging ​normalizing flows, AGDiC explicitly models the latent distributions of neuroimaging modalities (sMRI, fMRI, PET) as class-conditioned Gaussian variables, ensuring distributional consistency between recovered and real data. Observed modalities are mapped to a shared latent space, where missing modalities are generated through latent representation averaging and inverse flow transformations, preserving discriminative features crucial for diagnosis. This approach enables robust recovery under arbitrary missing patterns. The ​modality regularizer further stabilizes the process by aligning recovered features with diagnostic labels via an auxiliary classification loss, effectively bridging generative recovery and task-specific optimization.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The meanings of the symbols in Equation 4 are unclear; please use clearer notation. The second superscripted letter in Equation 3 should be “p” instead of “f.” These minor refinements would solidify the mathematical clarity without disrupting the overall coherence of the work. Furthermore, it is not necessary but useful, demonstrating that normalizing flows do not compromise the representational capability of the vectors. This can be validated by designing a single-modality experiment: after feature extraction, compare the classification performance between directly using the original features X and using the latent representations Z generated by normalizing flows. If the results are not significantly different, it will empirically confirm that the flow-based transformation preserves the discriminative power of the features.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper shows some strengths in methodological design, and the effectiveness of problem-solving, and the problem tackled in this study aligns closely with practical challenges encountered in clinical AD diagnosis.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

Response to Reviewer 1: 1. Regarding the explanation of Equation 4: the first term in brackets in Equation 4 represents the conditional probability density function, i.e., the density of the latent variable Z^(m) under the Gaussian distribution corresponding to class cc. The second term is the log-determinant of the Jacobian matrix for transforming X^(m) to Z^(m). Since normalizing‐flow models are invertible, this term must be included to satisfy the “change of variables” formula in probability transformations, while ensuring that the density computed via the invertible flow model is properly normalized and numerically stable. 2. Regarding Equation 3: the superscript character in the second term of Equation 3 should indeed be “p”. Response to Reviewer 2: 1. Regarding the question on Fig. 1 and modality missingness: our model supports arbitrary modality missingness in both training and inference—that is, modalities may be missing at training time or at inference time, and the definition of missingness is the same in both cases. The missingness is random. As emphasized in the first paragraph of the Method section, Fig. 1 uses the assumption of PET‐modality missingness purely for clarity and without loss of generality. In practice, provided that at least one modality is available, any modality may be missing during training or inference. We cannot remove the PET imaging part from Fig. 1, as it is essential to illustrating the cross‐modal distribution conversion process integral to our methodology. 2. Regarding why the Introduction does not discuss single‐modality learning: our paper focuses on “incomplete multimodal learning,” a concept that does not apply to single‐modality tasks. Incomplete multimodal learning arises from the limitations of complete multimodal tasks, not from comparing single‐modality weaknesses against multimodal approaches. Thus, we concentrate on “multimodal” and its “incompleteness,” rather than on single‐modality learning, which is not directly related. 3. Regarding why singlemodality learning methods are not included in the comparative experiments: on the one hand, as noted in our previous response; on the other hand, considering that under different modalitymissingness rates the number of samples for singlemodality methods would change, which would be unfair because the sample size in our current comparisons remains constant. We focus our comparisons on two categories—reconstructive and nonreconstructive methods—which are the core of our work. 4. Regarding the explanation of “reconstruction experience”: your analogy to autoencoders is apt. Reconstruction experience is a concept under the reconstructive paradigm for incomplete multimodal learning. As noted in the second paragraph of the Introduction, current reconstructive paradigms assume complete and available modality data during training. Specifically, these traditional methods rely on real data (experience) during training to correct (reconstruct) generated data and only support modality absence at test time. They cannot adapt to a reconstructive paradigm lacking “reconstruction experience,” under which existing models’ performance degrades substantially—this core issue is precisely what we address. Response to Reviewer 3: 1. Regarding the title and why AD was not included as a class in the experiments: strictly speaking, the title should emphasize the detection of cognitive impairment rather than AD specifically. We excluded AD in experiments for three reasons: (1) in ADNI2, AD patients have fewer samples across the three modalities compared to the other three classes, causing class imbalance; (2) Alzheimer’s disease is progressive—per follow‐up records, some LMCI patients later converted to AD; (3) early‐stage disease detection allows treatment opportunities, which is critical for AD diagnosis. 2. Analysis suggestions for clinical interpretability: this recommendation is valuable, and we will expand on it in future work.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top