Abstract

Neurological conditions, such as Alzheimer’s Disease, are challenging to diagnose, particularly in the early stages where symptoms closely resemble healthy controls. Existing brain network analysis methods primarily focus on graph-based models that rely solely on imaging data, which may overlook important non-imaging factors and limit the model’s predictive power and interpretability. In this paper, we present BrainPrompt, an innovative framework that enhances Graph Neural Networks (GNNs) by integrating Large Language Models (LLMs) with knowledge-driven prompts, enabling more effective capture of complex, non-imaging information and external knowledge for neurological disease identification. BrainPrompt integrates three types of knowledge-driven prompts: (1) ROI-level prompts to encode the identity and function of each brain region, (2) subject-level prompts that incorporate demographic information, and (3) disease-level prompts to capture the temporal progression of disease. By leveraging these multi-level prompts, BrainPrompt effectively harnesses knowledge-enhanced multi-modal information from LLMs, enhancing the model’s capability to predict neurological disease stages and meanwhile offers more interpretable results. We evaluate BrainPrompt on two resting-state functional Magnetic Resonance Imaging (fMRI) datasets from neurological disorders, showing its superiority over state-of-the-art methods. Additionally, a biomarker study demonstrates the framework’s ability to extract valuable and interpretable information aligned with domain knowledge in neuroscience. The code is available at https://anonymous.4open.science/r/BrainPrompt.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2260_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/AngusMonroe/BrainPrompt

Link to the Dataset(s)

https://doi.org/10.17608/k6.auckland.21397377

BibTex

@InProceedings{XuJia_BrainPrompt_MICCAI2025,
        author = { Xu, Jiaxing and He, Kai and Tang, Yue and Li, Wei and Lan, Mengcheng and Dong, Xia and Ke, Yiping and Feng, Mengling},
        title = { { BrainPrompt: Multi-Level Brain Prompt Enhancement for Neurological Condition Identification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15971},
        month = {September},
        page = {173 -- 183}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This work presents BrainPrompt, an innovative framework that enhances Graph Neural Networks (GNNs) by integrating Large Language Models (LLMs) with knowledge-driven prompts, enabling more effective capture of complex, non-imaging information and external knowledge for neurological disease identification.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Sufficient experiments and interpretability visualization are presented.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The need of LLM in projecting demographic information and disease-level prompt as high-dimensional condition is unclear. Why not use categorical condition or MLP-based condition?
    2. The influence of the hyper parameter lambda has not been validated.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Sufficient experiments and meaningful interpretability have validated the effectiveness of this paper, but the significance of subjects-level and disease-level prompt is challenging.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper introduces BrainPrompt, a novel brain network classification framework that integrates Large Language Models (LLMs) into Graph Neural Networks (GNNs) through multi-level prompt enhancements. Specifically, it introduces three types of natural language-based prompts: (1) ROI-level prompts that describe brain region functionality, (2) Subject-level prompts that encode demographic metadata (e.g., age, sex, site), (3) Disease-level prompts representing disease stage descriptions.

    These prompts are processed using a frozen LLM, and their representations are fused into the graph learning process to improve prediction of neurological stages. The paper evaluates BrainPrompt on the ADNI (Alzheimer’s) and ABIDE (Autism) fMRI datasets, reporting strong improvements over baseline models and highlighting its interpretability via biomarker analysis.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The approach is the first to integrate natural language-based knowledge prompts at multiple semantic levels (ROI, subject, disease) into fMRI-based brain network analysis.
    2. BrainPrompt-G achieves 70.41% accuracy on ADNI and 65.82% on ABIDE, outperforming all prior GNN and CNN methods like BrainNetCNN, LG-GNN, and fTSPL.
    3. The authors provide a compelling breakdown of the contributions of each prompt. The combined use leads to up to +7.54% gain over the base GCN model.
    4. The paper uses integrated gradients to highlight important brain regions, correlating with known disease markers. This strengthens the interpretability claims and offers neuroscientific relevance.
    5. The paper emphasizes that LLMs offer semantic fusion of imaging and non-imaging modalities, which conventional models often ignore. The use of natural language for prompts is both elegant and flexible.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The ROI-level prompts are generated using ChatGPT, but there is no quality control or inter-rater reliability described. No error analysis is provided in case ChatGPT mislabels or ambiguously describes an ROI.
    2. The text encoder (LLM) is frozen during training, potentially restricting deeper interactions between graph and language modalities. An ablation where the encoder is fine-tuned would have added insight into this decision.
    3. In Equation 3, the thresholding for adjacency construction in the population graph (based on demographic and node similarity) is set manually, without empirical justification or sensitivity analysis.
    4. The model only uses “informative tokens” from LLM-encoded prompts (Fig. 1), but the selection mechanism is vaguely described.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please refer to the strengths and weaknesses section. I think the strengths slightly outweigh the weaknesses.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors introduce BrainPrompt. A method that uses prompts on Large Language Models (LLMs) combined with Graph Neural Networks (GNNs) to predict the neurological condition of patients from their neuroimaging modalities. The embeddings resulting from the multi-level prompts act as prior knowledge that helps to identify relevant tokens for the predictive task.

    The method is evaluated on ADNI (Alzheimer) and ABIDE (Autism) datasets. BrainPrompt outperformed pure GNN methods and identified biologically relevant ROIs as potential biomarkers.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The multi-level prompt is an original and effective utilization of LLMs for integrating prior knowledge. In particular, the adaptability of the framework to existing methods such as GCN or BrainNetCNN.

    • The superior performance of BrainPrompt is demonstrated through a strong and comprehensive evaluation wrt state of the art models (eg fTSPL) and with an ablation study.

    • The enhanced interpretability offered by the biomarker study is aligned with neuroscience knowledge.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The authors did not challenge the reliability of the LLMs, the prompts, and the embeddings on the downstream task. Although the results suggest that there is a positve impact of LLMs on the prediction, the impact of the choice of another LLM than ChatGPT, the impact of different prompting strategies, and the impact of another embedding than Llama-encoder shoud be studied.

    • The authors used the AAL atlas for the brain parcellation. It would have been nice to study the impact of such choice by using other atlases. Since this work handles fMRI data, a functional atlas might be relevant as well.

    • The authors used the patient diagnosis available in the ADNI dataset. Nevertheless, EMCI and LMCI are not commonly used in the clinical routine. Instead, the reviewer suggests using progressive MCI and stable MCI that are often used in the litterature and have a more impactful application in practice.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors present a novel approach for integrating LLMs, GNNs, fMRI time series data, and clinical assessments. Their evaluation is comprehensive, including benchmarking against current methods and an ablation study performed on two separate datasets addressing distinct brain diseases. Nevertheless, some aspects deserve to be more deeply studied (see weaknesses).

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We sincerely thank all reviewers for their constructive feedback and insightful suggestions, which have greatly helped improve the quality of our work. We appreciate the recognition of our key contributions: the innovative integration of natural language-based knowledge prompts at multiple semantic levels (R1, R2), the elegance and flexibility of our framework (R2), the comprehensive evaluation and strong empirical results (R1, R2, R3), and the enhanced interpretability aligned with neuroscience knowledge (R1, R2, R3). We respond to the comments of each reviewer as follows.

[R1W1: Why use prompts instead of categorical or MLP-based conditions?] We agree that encoding demographic and disease information using categorical or MLP-based methods has been explored in prior works such as LG-GNN. However, these methods often fail to capture the continuity and semantic richness inherent in such clinical data. In contrast, our use of natural language prompts allows for a more expressive and flexible representation that better reflects nuanced relationships and expert knowledge. This contributes to the superior performance of BrainPrompt, as demonstrated in our experiments.

[R1W2, R2W3: Influence of the hyperparameters] Due to space constraints, we omitted a detailed sensitivity analysis of the hyperparameters λ and th. We acknowledge its importance and will include a thorough analysis in the next revision to study the effect of these hyperparameters on performance.

[R2W1: Quality control of ROI-level prompts generated by ChatGPT] While the ROI-level prompts were initially generated using ChatGPT, we conducted a manual review in collaboration with a domain expert from the School of Public Health to ensure their clinical relevance and correctness.

[R2W2: Why freeze the LLM encoder?] We chose to freeze the LLM encoder to prevent overfitting, given the limited size of fMRI datasets. Prior studies suggest that fine-tuning large LLMs on small datasets often yields marginal gains at the cost of increased computational complexity and overfitting risk. Nonetheless, we agree that exploring fine-tuning strategies is an interesting direction for future work.

[R2W4: Clarification of informative tokens] Thank you for the observation. The informative tokens refer to the content-specific segments in the prompt (e.g., age, sex, site), which are filled into a predefined template. We will clarify this in the revised manuscript for better readability.

[R3W1, R3W2: Generalizability to other LLMs and brain atlases] Thank you for this valuable suggestion. In this work, our primary goal was to demonstrate the feasibility and effectiveness of integrating multi-level non-imaging knowledge through prompt learning. We chose the AAL atlas due to its wide use and its ability to provide both anatomical and functional context. However, BrainPrompt is designed to be modular and adaptable to various LLMs and parcellation schemes, including functionally derived atlases. We plan to investigate these alternatives in future work.

[R3W3: Labeling of MCI subtypes in ADNI] We appreciate the reviewer’s suggestion. While we agree that using progressive and stable MCI categories would be clinically impactful, the ADNI dataset used in this study only provides EMCI and LMCI labels. Future studies could incorporate conversion-based MCI definitions to enhance clinical relevance.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top