Abstract

Alzheimer’s disease (AD) diagnosis faces the challenge of capturing complex patterns of subtle structural and functional changes in neuroimaging and the underutilization of clinical prior knowledge. Current deep learning methods primarily focus on structural magnetic resonance imaging (sMRI) analysis, often overlooking the critical disease concepts that clinicians rely on. To address this limitation, we propose a Prior-guided Prototype Aggregation Learning (PPAL) framework. This framework leverages structured prompts to large language models (LLMs) to extract disease-related anatomical descriptions as clinical prior knowledge and progressively aggregates the visual features of AD and cognitively normal (CN) individuals, bridging the semantic gap between sMRI features and LLM-derived clinical concepts to construct category prototype representations. Meanwhile, we design a slice selection and compression module that adaptively learns the importance of different slices, prioritizing those most critical for AD diagnosis. Ultimately, AD diagnosis is achieved by computing the semantic similarity between MRI slice features and the category prototypes. Experimental results demonstrate that, compared to state-of-the-art 2D slice-based methods, incorporating clinical prior knowledge not only enhances the identification of pathological regions but also shows significant advantages in the zero-shot mild cognitive impairment (MCI) conversion task.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0951_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/diaoyq121/PPAL

Link to the Dataset(s)

https://adni.loni.usc.edu/

BibTex

@InProceedings{DiaYue_Priorguided_MICCAI2025,
        author = { Diao, Yueqin and Fang, Huihui and Yu, Hanyi and Wang, Yuning and Tao, Yaling and Huang, Ziyan and Yeo, Si Yong and Xu, Yanwu},
        title = { { Prior-guided Prototype Aggregation Learning for Alzheimer’s Disease Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {488 -- 497}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper employs a transformer-based attention mechanism to extract anatomical features from AD (Alzheimer’s disease) and CN (cognitively normal) participants, as well as textual features generated by GPT-4. These features are then integrated to construct a diagnostic model based on multimodal embeddings from both visual and textual data.”

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Employ the LLM and attention to extract multi-modal information, resulting in superior diagnostic performance.
    2. The slice attention is employed to extract the most significant features in the 2D slices. 3.The text-aggregated attention combines the anatomical and text information to create text embeddings.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Slice Attention Mechanism: “The slice attention is based on the transformer architecture, which uses tanh, sign function, and mean operation to select important weights in the slice embedding space. However, how can ensure that larger weights truly correspond to more important slices?” The operation seems to simple combination of the activation function.
    2. Potential Information Leakage: “The diagnostic model relies on similarity between visual and text features. However, in the text-aggregated attention, the visual features (f_ad and f_cn) are used to extract text embeddings for classification. Could this lead to visual information leakage into the text branch?” 3.LLM Reliability: “The paper uses GPT and CLIP to generate AD and CN embeddings, but how can we ensure the LLM outputs correct text representations without sufficient fine-tuning of the LLM itself?”
    3. MCI Conversion Discrepancy: “The results include MCI conversion analysis, but the method section only describes using AD and CN embeddings. Since pMCI and sMCI embeddings are not used, how is this analysis actually performed?” 5.Figure Consistency Issues: “The innovation presentation in figures has scaling issues: in Figure 1, ‘SSCM’ appears too large, while in Figures 2 and 3, all variants are displayed too small, affecting readability.” 6.Methodological Comparison: “The number of comparison methods included in the study appears insufficient for comprehensive evaluation.” 7.Novelty Concern: “All key contributions (Slice Attention and text-aggregated attention) seem fundamentally based on standard self-attention and transformer architecture, which may limit the perceived novelty of the approach.”
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While using LLMs to generate AD (Alzheimer’s disease) and CN (cognitively normal) text inputs is innovative, the reliability of these synthetic texts remains questionable without rigorous validation. Furthermore, all proposed contributions (e.g., slice attention, text aggregation) are fundamentally extensions of standard self-attention and transformer architectures, which raises concerns about methodological novelty. Additionally, the figures poorly present key innovations—some elements are excessively small (e.g., model variants in Figures 2–3), severely compromising readability.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The paper can be accepted, since its quality is obiviously stronger than others.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a novel method for diagnosing Alzheimer’s disease called PPAL, which innovatively combines clinical prior knowledge with deep learning. The author used a large language model to extract disease features from clinical descriptions, designed a slice selection module to identify key slices in MRI, and bridged the semantic gap between image features and clinical concepts through text aggregation visual networks. Experiments have shown that this method has an accuracy rate of 85.38% in AD diagnosis tasks, surpassing existing technologies, and can accurately focus on key brain regions such as the hippocampus, enhancing the interpretability of the model. At the same time, it also performs well in zero sample MCI transformation prediction.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The excellence of this study lies in its innovative PPAL framework, which integrates clinical prior knowledge with deep learning architecture, achieving a paradigm shift in medical AI. Particularly commendable is how this method extracts structured disease knowledge from clinical literature through large language models, and addresses noise issues and visual-semantic gaps in MRI data through slice selection compression modules and text-aggregated visual representation networks. The results presented in the paper demonstrate that compared to existing methods, this research achieved higher accuracy in AD diagnostic tasks. More importantly, the attention heatmap visualization and clinical semantic-imaging feature mapping enhanced the model’s clinical interpretability, aligning the AI’s attention mechanism closely with known pathological regions of Alzheimer’s disease (such as the hippocampus). This alleviates the long-standing dilemma of “accuracy but non-interpretability” in the medical AI field, providing a more transparent diagnostic tool for clinical practice.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The PPAL framework integrates clinical prior knowledge with deep learning, but conceptually resembles existing approaches such as the knowledge-guided attention network by Cui et al. (2022) and the medical knowledge integration framework by Li et al. (2021), suggesting limited novelty.
    2. The direct transfer from Alzheimer’s Disease (AD) diagnosis to Mild Cognitive Impairment (MCI) conversion prediction lacks sufficient biological justification. AD and MCI exhibit distinct imaging characteristics and pathological progression patterns, making such simplified transfer potentially problematic. 3.The study lacks statistical significance analysis and fails to provide hyperparameter settings and random seed information, making it difficult to assess result stability and reproducibility. The comparison method selection appears biased, with some baseline methods (e.g., AwareNet) showing unusually poor performance on the sMCI vs. pMCI task (MCC only 3.90), suggesting potential errors or unfair comparison.
    3. Validation is confined to a single dataset (ADNI) without testing on independent datasets (such as AIBL or OASIS). The study lacks cross-center and cross-population generalizability validation. 5.The paper fails to address the class imbalance issue in the ADNI dataset, particularly the typically smaller number of pMCI samples, which could significantly impact results. Critical MRI image preprocessing details (spatial normalization, brain tissue segmentation, etc.) are also missing.
    4. While heatmaps are provided, there is no quantitative consistency comparison between attention heatmaps and expert neuropathological annotations. The study lacks explanatory analysis of false positive and false negative cases.
    5. The MCI conversion prediction accuracy (67.19%), though improved, remains inadequate for clinical application. There is no discussion of sensitivity versus specificity trade-offs for different clinical scenarios (screening vs. diagnosis). Furthermore, the study lacks comprehensive comparative analysis with clinical cognitive assessment scales (e.g., MMSE, ADAS-Cog). 8.In addition, I am also curious whether it is reasonable to directly use the medical priors provided by LLM. As is well known, LLM has a certain degree of illusion rate, and in risk sensitive medical scenarios, uncertain answers will lead to catastrophic results.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although progress has been made in some aspects of this study, there are still significant shortcomings in terms of methodological innovation, comprehensive validation, data processing transparency, and clinical practicality, which limit its scientific value and clinical translational potential.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper introduces the Prior-guided Prototype Aggregation Learning (PPAL) framework for Alzheimer’s disease (AD) diagnosis using structural magnetic resonance imaging (sMRI). It aims to address the challenge of incorporating clinical prior knowledge into deep learning models. The framework leverages large language models (LLMs) to extract disease-related anatomical descriptions, which are then used to guide the aggregation of visual features from AD and cognitively normal (CN) individuals. A slice selection module prioritizes relevant MRI slices for AD diagnosis, while the model calculates the semantic similarity between MRI slice features and category prototypes for diagnosis. Experimental results demonstrate that incorporating clinical prior knowledge enhances pathological region identification and improves the zero-shot mild cognitive impairment (MCI) conversion task, outperforming state-of-the-art 2D slice-based methods. The codes is partly available.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Innovative Approach: The integration of LLM-derived clinical prior knowledge with deep learning methods for sMRI is a novel and promising approach for AD diagnosis.

    Clinical Relevance: The paper successfully incorporates clinically relevant anatomical descriptions, bridging the gap between data-driven deep learning models and clinician expertise.

    Slice Selection and Compression: The slice selection and compression module effectively highlights the most relevant slices for AD diagnosis, improving the model’s focus on critical information.

    Experimental Validation: The framework outperforms state-of-the-art 2D slice-based methods and provides significant improvements in the zero-shot MCI conversion task.

    Code Availability: The authors propose to make the code publicly available, which is a valuable contribution to the research community.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    How were the QA pairs used to train the LLMs obtained and constructed? How can we ensure that these QA pairs adequately reflect expert knowledge? Furthermore, how can we verify that the LLM has successfully learned the expert knowledge from these pairs?

    The loss functions used in the proposed method, L_f and L_c, are directly added with a 1:1 weight ratio. Why is the weight for both losses equal? In terms of importance, L_f seems to be more crucial. Has the author investigated adjusting the weight ratio between these two losses?

    Why were 3D methods not considered in the comparison? The paper only compares 2D slice-based methods. I understand that 3D methods require more computational resources, but wouldn’t 3D methods significantly outperform 2D methods in terms of performance?

    There are some minor writing and formatting issues in the paper. For example, the SSC in the text and the SSCM module in Figure 1 should use the same abbreviation. Additionally, Figure 2, referenced in Section 2.1, is located too far from the relevant text, making it inconvenient to read. These layout issues could be improved.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although this paper still has some minor flaws, it generally meets the requirements of miccai.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank all reviewers for their overall support of our paper “novel approach” (R1,2&3), “model’s interpretability” (R1), and “strong performance” (R2&3). Below, we address 4 general concerns, followed by specific responses to each reviewer.

G1 Novelty (R1&3) Our proposed method is based on textual concept knowledge and introduces learnable token vectors, which fundamentally differ from existing knowledge-injection approaches. Unlike prior methods that rely on scarce and costly-to-obtain image-text pairs for supervision, our approach is more flexible and data-efficient. Another key innovation lies in the designed SSC module, which models inter-slice correlations and enables dynamic fusion. Unlike conventional attention mechanisms, our Slice Attention uses a structured design to separately capture slice intensity (via mean) and directional contribution (via tanh and sign). This leads to more interpretable and effective importance estimation. The effectiveness of this design is demonstrated both qualitatively and quantitatively in Fig. 4 and Table 3.

G2 LLM Reliability (R1,2&3) We adopt a structured QA approach to generate general disease concepts, which avoids hallucination by using clear, reasoning-light prompts. All generated content is clinically validated by expert review to ensure medical correctness. Also, the learnable prompt vectors we propose allow the model to dynamically refine textual representations based on visual input. This enhances both the reliability and discriminative power of the learned prototypes. The effectiveness and interpretability can be confirmed in Fig. 4b, Fig. 5, and Table 2.

G3 MCI Conversion (R1&3) The MCI experiments aim to evaluate the generalization and zero-shot capability of our method under the vision-language matching paradigm. We modify the text prompts with MCI-related concepts and use learned prototypes as anchors to assess conversion risk. Results show strong generalization to unseen categories. While not yet clinically applicable, we plan to enhance its biological justification and performance in future work.

G4 Experimental (R1,2&3) Regarding training and comparisons, we strictly follow the settings and results reported in [10]. All experiments were conducted using the same backbone, fixed random seeds, and five-fold cross-validation, ensuring stable and reproducible outcomes. Due to computational limitations and to better leverage pretrained weights for transfer learning, we adopt a 2D slice-based strategy to integrate 3D information, which suits our framework well. The compared methods include slice-level and CLIP-based models, covering the necessary baselines for fair validation. In future work, we plan to explore additional baselines (e.g., 3D models), expand dataset coverage, conduct multi-center validation, and perform biological and statistical analyses.

R1Q5 Class Imbalance: Our dataset contains 586 CN, 474 AD, 162 pMCI, and 154 sMCI cases, indicating no significant class imbalance. We thank the reviewer for pointing out the lack of some details. We will clarify these important points in the final version.

R1Q6 Heat map analysis: We will improve the visualization results and incorporate expert annotations to enhance the interpretability of the model.

R2Q2 Weights: Although we experimented with adjusting the loss function weights and observed slight performance improvements, the changes were not statistically significant. Therefore, we did not include the results in the main text.

R3Q2 Information Leakage: Our method leverages textual guidance to progressively aggregate relevant visual features without direct pairwise fusion, thus avoiding information leakage. This reduces modality gaps and improves cross-modal alignment and disease recognition. VadCLIP [19] also fuses visual features for similarity computation.

And in the final version, we will unify terminology, refine figure layouts, and adjust details to ensure clarity and consistency throughout the paper.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top