Abstract

Multi-modal brain disease diagnosis provides a more robust and comprehensive prediction of diverse diseases by integrating medical data from different modalities. However, recent methods generally fail to account for the modality-specific discriminant regions in semantic information, which causes models to focus on non-lesion areas while neglecting the actual lesion regions. To address this issue, we propose Semantic Prompt-guided Graph Learning (SPromptGL), a novel approach for multi-modal disease prediction that captures the discriminative regions of different modalities while enhancing their interaction and fusion. Firstly, to explore the relationship between subjects of different modalities, we propose constructing an interactively multi-relation graph for multi-modal data. It is dynamically learned by designing graph learning loss terms. The multi-layer graph convolutional neural network is utilized to learn context-enriched representations for each subject. Then, to better capture the significant region representations of different modalities, we propose a semantic prompt-guided learning network to excavate the modality-specific lesion regions of related diseases. Specifically, a set of semantic prompts of related brain diseases is first guided to capture fine-grained local details to enhance patch representation. And then we couple with a relation-aware embedding strategy to refine discriminative features. Compared with state-of-the-art methods, our approach achieves superior performance on different benchmark datasets. Code is available at https://github.com/wanxixi11/SPromptGL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/5129_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/wanxixi11/SPromptGL

Link to the Dataset(s)

N/A

BibTex

@InProceedings{WanXix_SPromptGL_MICCAI2025,
        author = { Wan, Xixi and Jiang, Bo and Li, Shihao and Zheng, Aihua},
        title = { { SPromptGL: Semantic Prompt Guided Graph Learning for Multi-modal Brain Disease } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15971},
        month = {September},

}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes SPrompGL, for multi-modal brain disease prediction that integrates semantic prompts and graph learning. Main contributions include: (1) Semantic Prompt Guidance Scheme (SPGS): Uses GPT-4-generated prompts to weight lesion-relevant patches via cross-modal attention. (2) Relation-aware Embedding Strategy (RES): Propagates fine-grained local features into global contexts for robust representation. (3) Superior Performance: Achieves state-of-the-art results on TADPOLE (AD prediction) and ABIDE (ASD prediction).

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Combining LLM-generated semantic prompts with graph learning for brain disease diagnosis, enhancing interpretability and lesion focus. (2) Focus on discriminative regions aligns with medical priors, improving model trustworthiness. (3) Evaluated on two large-scale datasets with multi-modal data.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    (1) The use of GPT-4 is very superficial, only the description of categories, and there is no quantitative evaluation of the output text quality and the effect of the impact. (2) Prompt-guided Learning Network (PEN and PLN are confused in the paper) is not well designed. The effect of the network has not been well verified. (3) Multi-relation graph construction is not fundamentally different from existing methods such as MGDR, and cross-modal attention is also normal cross-attention mechanism.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    SPrompGL enhance multi-modal brain disease diagnosis by integrating semantic prompts to focus on lesion-specific regions, while its graph-based fusion captures cross-modal dependencies. However, the work is lack of innovation, and the use of LLM is superficial.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposed SPrompGL, a graph-based framework for multi-modal brain disease prediction. It combines a multi-relation graph structure with a semantic prompt-guided embedding module to focus on modality-specific discriminative regions. The methods is evaluated on TADPOLE and ABIDE, achieving improved performance over prior methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The framework is clearly structured, with an integration of multi-relation graph construction and semantic prompt guidance. 2) Experiments on two public datasets demonstrate consistent performance improvements, and the ablation study helps clarify the contribution of each module.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1) The overall novelty is moderate—most components are adapted from existing techniques with limited methodological innovation. 2) Figure 1 lacks sufficient detail, and the main blocks should be labeled and explained more clearly in caption. 3) The PEN module is densely presented with limited intuition or visual explanation, making it difficult to understand how semantic prompts effectively guide the feature refinement process.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a clear framework and shows solid empirical results. However, the methodological novelty is limited, and the central prompt-guided module is not sufficiently explained or analyzed.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel Semantic Prompt-Guided Graph Learning (SPromptGL) model for multi-modal brain disease prediction. The method employs a graph-based representation to capture relationships among different modalities and designs a semantic prompt-guided graph learning network to extract discriminative features. Experiments on multiple public datasets demonstrate the superiority of the proposed method compared to existing approaches.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method tries to intergrate the semantic information from the LLM, hence improving the result of disease classification. Such a idea has caught my attention.

    2. The ablation study has also shown the effectiveness of the proposed prompt-guided learning, which demonstrates the importance of semantic information.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. As one of the main contribution of this paper, the proposed GFR does not appear to be significantly different from previous methods. The authors should explain the difference between GFR and the previous multi-graph based methods to highlight the contribution of this paper.

    2. Also for the GFR, all of the modality’s features are concatenated as F^{(r, 0)} and then input to the DGCN with A^{(r)}. I wonder if certain modalities introduce noise, since A^{(r)} is measured for a particular modality.

    3. In my view, the SPGS and RES have the similar purposes. Why not the authors use the S^{(m)} to perform the fusion of \hat{V}_p^{(m)} directly? Eq. 6 is more like aggregating V_P, which is similar to V_G, into V_G based on the similarity between V_P and V_G. Isn’t this somewhat contrary to the original intent of V_P?

    4. The introduction of semantic prompts seems to be important to the whole framework as shown in experiments. However, how to guarantee the correctness of the results of the LLM output? Or how to avoid the illusion of LLM?

    5. Lack of some details. How to obtain the token embeddings? What is the exactly formulation of L_s(A)? And some typos, like missing transposed symbols in Eq. 6, CMA in ablation study.

    6. The code is not open source.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the introduction of prompt for disease prediction in this paper is novel enough and the experimental results are convincing, but it still has some issues that need to be clarified and the code is not open source.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors address part of my concerns, while I still think the factual hallucination produced by LLM may impact the performance of the result.




Author Feedback

We thank all reviewers for your valuable comments and suggestions for improving our work.

To Meta Reviewer Thank you for recognizing the contributions of our work. In our Rebuttal, we will provide clarifications on the methodological details and ablation studies to address the concerns raised.

To Reviewer #1 Q1: The use of GPT-4 is with only category descriptions and no quantitative evaluation of its output quality or impact. A1: GPT-4 generates meaningful textual concepts for each category, as similarly used in previous work [23]. These concepts encode prior knowledge related to specific diseases, including associated symptoms and biomarkers, and serve as semantic guidance in place of costly expert annotations. In Experiments, we conduct an ablation study as Fig. 2(d) to evaluate the impact, which shows consistent performance gains, proving effectiveness. Q2: The paper confuses PEN and PLN, and does not verify the effectiveness of the proposed network well. A2: In our work, PLN denotes the overall network, while PEN is a sub-module in PLN responsible for implementing modality-specific prompt-guided learning. The effect of PEN (composed of RES and SPGS) is validated in Fig. 2. Fig. 3 shows its impact on each modality, confirming its effectiveness on multi-modal learning. Q3: The multi-relation graph method is similar to MGDR. A3: Our method is different from MGDR in two aspects. First, we propose to incorporate the label information to guide graph structure learning which can perform robustly w.r.t. noises. Second, we design a new message passing model for multi-graph learning which enables multi-modal information propagation to enhance multi-modal fusion. As shown in Fig. 2(b), our approach consistently outperforms MGDR on most evaluation metrics.

To Reviewer #2 Q1-2: Is GFR really different from existing multi-graph methods? Using modality-specific graphs on fused features might add noise. Is this considered? A1-2: GFR has two main differences from existing multi-graph methods. First, we enforce label-guided structure regularization in multi-relation graphs, improving robustness. Second, our message passing supports multi-modal propagation, promoting deeper interaction. Similarly, we are concerned on the interference of noise. Thus, we introduce a label-based constraint for this structured graph to enhance the consistency, reducing the impact of modal noise. Q3: Why not use the S^{(m)} to perform the fusion directly? A3: We start by using S^{(m)} to learn semantic relations within a single modality. But since samples can differ a lot, we also bring in global cues to help the model focus on the right sample-related features and improve accuracy. Q4: How do you ensure the accuracy of the LLM output? A4: LLM has been utilized in medical images [2,5,30], verifying its strong capacity. Empirically, we adapt the LLM output which boosts feature learning and interaction, improving brain disease prediction. Q5: Lack of some details. A5: Thank you for your valuable comments. More details will be introduced in the last modified version. Q6: The code is not open source. A6: Our code will be made public upon acceptance.

To Reviewer #3 Q1: The overall novelty is moderate. A1: The key contributions are two aspects. (1) Our work is the first work to introduce LLM to the multi-modal (imaging and non-imaging modalities) brain disease prediction task. (2) We introduce labeling cues to constrain the multi-relation graph learning, avoiding interference of modal noise to obtain robust prediction results. Q2-3: Fig. 1 needs clearer module labeling and captions. How do semantic prompts effectively guide the feature refinement process? A2-3: We will update Fig. 1. A set of semantic prompts associated with brain diseases is used to guide the network to focus on discriminative lesion regions at the fine-grained local details. These enhanced discriminative features are then integrated into the global context to refine the semantic consistency.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    The paper presents a novel and promising approach for multi-modal brain disease diagnosis by integrating semantic prompt-guided learning with graph-based modeling. The reviewers generally acknowledge the novelty and solid empirical results. However, there are valid concerns regarding clarity of methodological details and completeness of ablation studies. These are reasonable to address in a rebuttal. Given the potential contribution and the constructive nature of the critiques, I recommend Invite for Rebuttal.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The idea of using semantic prompts from LLMs to guide multi-modal brain disease prediction is interesting, the current use of GPT-4 remains very superficial — there is no clear validation of the quality or reliability of the generated prompts, which is a key part of the proposed approach. Moreover, the main technical components (multi-relation graphs, cross-modal attention) are fairly standard and not sufficiently distinguished from prior work. Several methodological issues (such as unclear design choices in the PEN/PLN modules, limited explanation of graph construction, and risks of noise from concatenated modality features) remain unresolved. The rebuttal clarified some points but didn’t provide enough new insight to address these core concerns.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top