Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

The multimodal model has shown superior potential for accurate Alzheimer’s disease (AD) diagnosis; however, its reliance on complete modalities limits its use in a clinical setting. This study proposes a novel Anatomical Graph-based Multilevel Distillation (AGMD) framework that effectively transfers multimodal knowledge using layered modeling. Specifically, we develop a hierarchical distillation framework with three dedicated branches to explicitly capture the features of AD from multiple levels (local structural details, regional connectivity patterns, and global semantic information) to achieve complete knowledge transfer. Moreover, we introduce anatomical constraints to model the brain adjacent connection patterns to help better learn the relationships between key ROIs, particularly in disease-relevant regions, e.g., the hippocampus. The prediction entropy as regularization is introduced to refine instance-level knowledge, comprehensively alleviating the negative impact of the teacher’s noisy information. Extensive experiments on the ADNI dataset demonstrate that AGMD achieves the best classification accuracy, with an improvement of 3.7% over the state-of-the-art methods, while significantly reducing the performance gap between teacher and student models. The code is available at https://github.com/LiuFei-AHU/AGMD.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3438_paper.pdf

SharedIt Link: https://rdcu.be/eHwYc

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04984-1_8

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/LiuFei-AHU/AGMD

Link to the Dataset(s)

https://adni.loni.usc.edu/

BibTex

@InProceedings{LiuFei_Anatomical_MICCAI2025,
        author = { Liu, Fei AND Wang, Huabin AND Jaward, Mohamed Hisham AND Liang, Shiuan-Ni AND Ong, Huey Fang AND Cheng, Jiayuan},
        title = { { Anatomical Graph-based Multilevel Distillation for Robust Alzheimer’s Disease Diagnosis with Missing Modalities } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {74 -- 83}
}

Reviews

Review #1

Please describe the contribution of the paper

The authors propose a novel framework termed Anatomical Graph-based Multilevel Distillation (AGMD) for the diagnosis of Alzheimer’s Disease (AD). The framework comprises three key components: (1) a Cross-Modal Attentive Fusion Transformer (CMT) for fusing shallow features from the teacher model, (2) an anatomical-guided graph learning module to model brain connectivity, and (3) an Uncertainty-Aware Gating (UAG) mechanism for dynamic knowledge refinement. The authors claim that AGMD outperforms existing knowledge distillation approaches in AD diagnosis tasks.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The incorporation of anatomical priors into graph construction is both insightful and empirically validated.
- The proposed methodology is coherent and well-motivated.
- The comparison with other knowledge distillation methods is sufficient and demonstrates performance advantages.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The manuscript requires significant improvements in writing clarity. For instance, the description of cosine similarity for selecting top-k neighbors in Semantic Graph Construction is ambiguous and should be more explicitly detailed.
- The sample size is relatively small—only 269 subjects—despite the availability of a larger cohort in the ADNI dataset. This limitation raises concerns about generalizability.
- The proposed method is not benchmarked against state-of-the-art (SOTA) approaches for AD diagnosis. The chosen baselines and Graph Transformer models do not represent the current best-performing methods in this field.
- I have one big concern about the motivation of this paper. This KD framework has already leverage the PET image, why not just directly train with MRI image, I think it is unnecessary to distill such a model only using MRI image (As you already use the data with PET image)
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the paper presents innovative ideas and contributes valuable perspectives to multimodal AD diagnosis, the aforementioned issues—particularly the limited experimental setup and unclear motivation for MRI-only distillation—must be addressed before the work can be considered for acceptance. Therefore, I think the author should clarify the weakness before the acceptance.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper describes a novel method for knowledge distillation aimed at improving Alzheimer’s disease (AD) classification. The proposed approach involves training a student network using only MRI data, while leveraging the guidance of a teacher network trained on both MRI and PET modalities. The proposed mechanism uses distillation at multiple levels of the encoder, and uses anatomy-inspired graphs for efficiently training the student network. It also uses teacher model’s uncertainty estimate to further prioritize reliable knowledge transfer.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This is a well-designed study addressing a clinically relevant problem in AD classification i.e. achieving good performance when PET modality is missing. The different novel aspects of the approach (as mentioned above) are well formulated and the ablation study shows the importance of each component of it. The validation experiment comparing the performance with other knowledge distillation approaches is comprehensive as well.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Weaknesses of the paper are minor as listed below:
1. The paper doesn’t specify which PET modality is being used in the experiments (could be Amyloid PET, Tau PET, or FDG PET). There is a brief mention of brain metabolism in the introduction which suggests that it is FDG PET. But clarifying this would be important.
2. References 7 and 8 are duplicates of each other.
3. There is mention of class-imbalance in Section 3 (experimental set up). But there is no mention of actual numbers while describing the dataset. A brief description of this imbalance would be justified.
Lastly, the training and test datasets are very small which makes me question the generalizability of the approach. While the MRI + PET data availability necessitates the training dataset to be small, additional validation experiments on the larger MRI-only dataset available in ADNI would help in convincing the readers about the generalizability of the method.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

While the description of the method is clear, it is generic with respect to the number of layers used, size at each layer etc. which makes reproducibility a challenge. An accompanying open-source code would help in ensuring reproducibility.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is a well-designed study that addresses a relevant problem with novel method development. The performance of the method is benchmarked with other available approaches for knowledge distillation.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

My original concerns were only minor and the authors’ rebuttal addressed those concerns.

Review #3

Please describe the contribution of the paper

The authors propose a hierarchical pathology modeling framework that captures Alzheimer’s Disease (AD) pathology at three distinct levels: local, regional, and global. They introduce a approach by constructing anatomically guided graphs, based on structural connectivity derived from AD-aware brain regions and their neighboring areas. This ensures that the graph edges represent biologically plausible pathways between disease-relevant regions of interest. Additionally, the framework incorporates a dynamic distillation process, where the entropy of the teacher model’s predictions is used to prioritize reliable predictions, thereby enhancing the overall reliability of the model’s output.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1.The paper is well-written and clearly explained.
1. The AGMD framework proposes an innovative approach to cross-modal knowledge transfer, where a teacher model trained on both MRI and PET data guides a student model using only MRI.
2. The framework leverages anatomical-guided graph learning and uncertainty-aware gating to refine predictions and enhance model performance.
3. The technical explanations for each component of the framework are thorough and satisfactory.
4. The experiments and ablation study are satisfactory.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The integration of multi-level techniques, such as 3D CNN, GCN, and transformer models, suggests that the framework may be computationally intensive. It would be beneficial to address the computational efficiency and scalability of the AGMD framework, particularly in terms of resource requirements and potential bottlenecks. 2.In the experimental section, the line “AGMD (w/o KD) is the model using the same student architecture but without KD” could be clarified. The architecture of the teacher-student model setup is not clearly defined. It would help to provide additional context, such as the performance of AGMD using both MRI and PET but without knowledge distillation (KD), to make the results more transparent and understandable. 3.There needs to be a clear justification for why the Cross-Modal Attentive Fusion Transformer (CMT) is applied at every layer. The reasoning behind this choice should be explained to ensure its necessity and contribution to the framework.
2. In Figure 2, the acronym “EU” is mentioned but not defined. It would be helpful to clarify what “EU” stands for to avoid any confusion.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The AGMD framework offers several notable benefits for Alzheimer’s Disease diagnosis. By leveraging cross-modal knowledge transfer, it enables a student model to effectively learn from a teacher model trained on both MRI and PET data, even when only MRI data is available for the student model. This reduces the dependency on PET scans, which are often more expensive and less accessible. The use of anatomical-guided graph learning ensures that the model captures biologically plausible brain connectivity, enhancing its interpretability and accuracy in identifying disease-relevant regions. Additionally, the uncertainty-aware gating mechanism prioritizes more reliable predictions, improving the robustness and performance of the model. However, the paper has some weaknesses. The computational efficiency and scalability of the method, especially with its use of complex techniques like 3D CNN, GCN, and transformers, are not addressed, which raises concerns about resource requirements. Furthermore, there is a lack of clarity in the experimental section regarding the teacher-student model architecture, particularly when knowledge distillation is excluded.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I am satisfied with the authors’ responses to all of my comments.

Author Feedback

We sincerely thank the reviewers for their constructive feedback and affirmation of our work. The concerns are addressed below. [R1,3&R2,2] Dataset size and generalizability: We appreciate the reviewers for highlighting this critical aspect. While we fully acknowledge the importance of larger datasets, the current study focuses on proof-of-concept validation of methodological innovation. To ensure data consistency and avoid biases, we retained only baseline scans. Despite the relatively limited sample size, the proposed AGMD demonstrates significantly superior performance compared to all baselines, underscoring its potential. Future work will involve expanding validation to larger MRI-only cohorts to comprehensively evaluate scalability and generalizability. [R2,3] Benchmark comparisons: We concur that compared with more state-of-the-art diagnosis approaches would further strengthen our study. However, we clarify the rationale behind our current experimental design with two key points: (1) Our study emphasizes how knowledge distillation (KD) can substantially enhance single-modality networks for brain disease analysis, which are more clinically practical than multi-modal networks. We conducted extensive comparisons with mainstream KD methods (as shown in Table 1), demonstrating consistent improvements over these methods; (2) To ensure the representativeness of the teacher model, we validated it against both traditional and graph-based methods (as presented in Table 1 and Figure 2). The superior performance justifies its use as a credible source for distillation, particularly in the context of the proposed CMT and anatomical graph-based learning. [R2,4] Motivation:PET’s frequent unavailability demands distilling MRI+PET teacher knowledge into a practical MRI-only student model, addressing real-world diagnostic gaps. Direct MRI-only training yields suboptimal results, while our distillation framework bridges this gap by transferring hierarchical multimodal knowledge (Table 1). Unlike unimodal distillation, our method uniquely integrates cross-modal fusion (CMT), anatomically guided graphs, and uncertainty-aware gating to encode disease-specific inter-modal knowledge from MRI+PET, ensuring clinically robust performance. [R2,1] Ambiguous cosine similarity description: Computation of cosine similarity of nodes’ feature is similar to the calculation in existing studies [3]. The top-k most similar nodes (with the highest cosine values) are selected with the edges among them by corresponding entries in the adjacency matrix are set to 1, while others are set to 0, enabling a sparse graph structure. [R4,3] Justification for applying CMT at each layer: We apply CMT at every layer to capture and facilitate cross-learning of multi-scale intermediate features, which contain early pathological patterns (supported by existing studies) and are more robust to noise. [R4,2] Ambiguous of AGMD setup: We clarify that AGMD (without KD) refers to training the student model using MRI alone, without transferring knowledge from the teacher. We compare it with the student model under KD to emphasize the necessity of distillation. [R4,1] Computational efficiency: We acknowledge that our AGMD may raise computational complexity concerns. However, AGMD employs efficient architectural choices (lightweight CNNs and sparse graph structures) to mitigate this issue. Furthermore, shared encoders can be applied to further reduce resource requirements. [R1,4] Reproducibility: To ensure reproducibility, our code will be publicly released. [R1,2&R4,4] Editing typos: The acronym “EU” was a typographical error and will be corrected to “UAG”. The duplicate reference [8] will be removed in the final version. [R1,1,3] PET type and class details: We confirm that 18F-FDG PET was used, as stated in Section 3. A brief description of class information will also be added for clarity.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

This paper presents a novel and well-structured approach to address the critical challenge of missing modality data in Alzheimer’s disease (AD) diagnosis, a scenario frequently encountered in clinical practice. The proposed Anatomical Graph-based Multilevel Distillation (AGMD) framework integrates hierarchical pathology modeling with anatomically-guided graph learning and an uncertainty-aware knowledge distillation strategy.

Strengths and Contributions:

Innovative Multilevel Distillation Architecture: The design of AGMD explicitly models the multilevel nature of AD pathology—local atrophy, regional connectivity, and global semantics—leveraging 3D CNNs, GCNs, and transformers. This layered approach aligns well with known neuropathological progressions and clearly outperforms traditional single-level distillation strategies.

Anatomical Graph Constraints: Unlike existing graph-based distillation methods that use naive graph construction, this paper introduces biologically plausible graph building based on anatomical priors, especially emphasizing disease-relevant regions (e.g., hippocampus). This greatly enhances the interpretability and clinical relevance of the model.

Uncertainty-Aware Gating (UAG): The use of prediction entropy to dynamically modulate distillation weights based on the reliability of teacher predictions is both theoretically sound and practically impactful, resulting in better generalization for the student model trained without PET.

Comprehensive Experiments: The paper demonstrates state-of-the-art performance on a challenging 4-class AD/pMCI/sMCI/NC classification task using the ADNI dataset. Ablation studies (Figure 2) rigorously support the contribution of each component, and comparisons with multiple baselines show clear and consistent improvements across all evaluation metrics.

Overall, this paper makes a significant technical contribution and provides a clinically meaningful solution to an important problem in medical imaging and neurodegenerative disease diagnosis.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Anatomical Graph-based Multilevel Distillation for Robust Alzheimer’s Disease Diagnosis with Missing Modalities

Author(s):