Abstract

Due to the high stakes in medical decision-making, there is a compelling demand for interpretable deep learning methods in medical image analysis. Concept-based interpretable models, which predict human-understandable concepts (e.g., plaque or telangiectasia in skin images) prior to making the final prediction (e.g., skin disease type), provide valuable insights into the decision-making processes of the model. However, existing concept-based models often overlook the intricate relationships between image sub-regions and treat concepts in isolation, leading to unreliable diagnostic decisions. To overcome these limitations, we propose a Concept-induced Graph Perception (CGP) Model for interpretable diagnosis. CGP probes concept-specific visual features from various image sub-regions and learns the interdependencies between these concepts through neighborhood structural learning and global contextual reasoning, ultimately generating diagnostic predictions based on the weighted importance of different concepts. Experimental results on three public medical datasets demonstrate that CGP mitigates the trade-off between task accuracy and interpretability, while maintaining robustness to real-world concept distortions.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0253_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZhaLei_Conceptinduced_MICCAI2025,
        author = { Zhao, Lei and Chen, Changjian and Pu, Bin and Qi, Xiaoming and Peng, Fengfeng and Wang, Chunlian and Li, Kenli and Tan, Guanghua},
        title = { { Concept-induced Graph Perception Model for Interpretable Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15971},
        month = {September},
        page = {216 -- 226}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes the CGP model, which enhances interpretability in medical image diagnosis by modeling relationships between concepts and image regions.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors propose to model the concept relations with graph neural network which is an interesting idea. The effectiveness of the method can be well demonstrated by the conducted experiments.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The relationships between concepts were neither presented nor explained, making it unclear whether the method can model such relationships.
    2. The paper is not well-written. Some details are not well explained, and without code, the work is difficult to reproduce. For example, “adds self-loops” and “L_{ASL}” in page 5; “True-wight-decay” and “RandAugment” in page 6; how the uncertain concept labels are generated in page 7.
    3. The paper does not explain how the concept activation maps are generated. If they are produced using Grad-CAM, it cannot demonstrate that the model truly focuses on the image regions most relevant to each concept.
    4. How exactly is the intervention performed? The surprisingly high results after intervention are inconsistent with findings from other papers [1], especially for the fitzpatrick17k dataset.

    [1] A Closer Look at the Intervention Procedure of Concept Bottleneck Models, ICML 2023.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is not well-written. Some of the results are not consistent with the existing papers. The key contribution is not faithfully proved in the experiments.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    Regarding Q1: If the authors claim to have modeled the intrinsic relationships between concepts, they should demonstrate these relationships through visualization and examine whether they align with medical knowledge [1], rather than solely relying on improved diagnostic accuracy as evidence. An increase in diagnostic accuracy alone does not necessarily indicate that the inter-concept relationships have been effectively captured. Regarding Q2: The reviewer’s concerns focused on the lack of clarity and insufficient details in the method description. For example, it is unclear what types of random augmentations were used, why asymmetric loss was chosen over the commonly used cross-entropy loss, and how the uncertain concept labels were generated. The absence of these crucial details reduces the readability of the paper and diminishes its overall contribution. Regarding Q4: First of all, after reading the rebuttal, the reviewer is still unclear whether the inconsistency in results is due to natural variation or different intervention settings. In [2], the accuracy on the Skincon dataset decreases as the level of intervention increases. Since the trend in results is different, the reviewer finds it hard to believe that the inconsistency is simply caused by natural variation. The reviewer also reviewed the intervention setting in [3] and believes it is inappropriate. The purpose of intervention is to correct concept predictions in order to achieve better task-level predictions. However, the setting in [3] merely sets concept predictions below a certain threshold to zero, which does not align with the common practice in the CBM community [4–7]. Therefore, the reviewer does not believe the intervention results obtained under this setting are meaningful.

    The authors’ rebuttal fails to address the weaknesses of the paper and the reviewer’s concerns about it. Therefore, the reviewer recommends rejecting the paper.

    [1] Graph Concept Bottleneck Models. https://openreview.net/pdf?id=qPH7lAyQgV 
[2] A Closer Look at the Intervention Procedure of Concept Bottleneck Models, ICML 2023. 
[3] Concept Complement Bottleneck Model for Interpretable Medical Image Diagnosis. 
[4] Concept Bottleneck Models. ICML 2020.
 [5] Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models. ECCV 2024.
 [6] Learning to Receive Help: Intervention-Aware Concept Embedding Models. NeurIPS 2023. 
[7] Learning to Intervene on Concept Bottlenecks. ICML 2024.



Review #2

  • Please describe the contribution of the paper

    The authors propose Concept-induced Graph Perception (CGP), which consists of Concept-induced Adaptive Perception (CAP) and Dual-scale Concept Graph Bottleneck (DCGB). The problem addressed is essentially hierarchical classification implemented by predicting components of a class together. The approach provides predicted categories and corresponding values for sub-image regions as evidence for the final medical image classification. These evidences on sub-image regions offer an interpretable diagnosis.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper effectively leverages background information (sub-image region analyses and their relations) to improve final classification, thereby enhancing the interpretability of the model’s diagnostic decisions. The experimental section presents a thorough and faithful evaluation of the proposed method.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The enhanced interpretability of the proposed model relies more heavily on extensive labeling efforts compared to simpler models.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a straightforward approach to enhancing the explainability of AI models for medical diagnosis. Although the method relies on extensive labeling efforts and the form of explainability presented is relatively basic, it appears sufficient to add diagnostic value.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a Concept-induced Graph Perception (CGP) model for interpretable medical image diagnosis. Inspired by the expert diagnostic process, the authors combine concept-based learning with graph reasoning. The model consists of two main components: CAP (Concept-induced Adaptive Perception) and DCGB (Dual-scale Concept Graph Bottleneck). CAP guides the extraction of sub-region visual features using clinically meaningful concepts, while DCGB performs both local and global reasoning over concept nodes to generate interpretable predictions. The authors conduct experiments on three benchmark datasets Derm7pt, BrEaST, Fitzpatrick17k.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The integration of graph-based concept reasoning is a meaningful contribution that goes beyond existing concept bottleneck models. 2) The dual-scale reasoning mimics how human experts synthesize clinical information. 3) The model outputs concept activation maps and textual concept-weight summaries, providing both visual and semantic interpretability. 4) Simulates human diagnostic behavior by reasoning over concept relationships and enabling intervention-friendly explanations, which could support clinical decision-making.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The comparison could be broadened, if feasible.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    The authors said that the entire code to reproduce the experiments will be available.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel and well-motivated approach to interpretable diagnosis through the Concept-induced Graph Perception (CGP) model. The proposed framework effectively combines region-level concept modeling (CAP) with dual-scale graph reasoning (DCGB), offering both high diagnostic accuracy and strong interpretability. The experimental evaluation is thorough, covering three public medical datasets, and demonstrates consistent performance gains over multiple state-of-the-art baselines. Additionally, the test-time intervention analysis strengthens the claim that the model’s decisions are faithful to the learned concepts. Overall, the paper makes a meaningful methodological contribution to interpretable medical image analysis and is well-suited for MICCAI.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have provided a thoughtful rebuttal, acknowledging the suggestion to broaden the comparison and expressing their intent to address it in future work. Given that the current evaluation already covers three diverse and representative datasets with strong performance, this limitation does not significantly affect the overall strength of the paper. I am satisfied with the response and maintain my positive assessment and recommendation for acceptance.




Author Feedback

To Area Chair and Reviewers: We thank all the reviewers for their recognition of our work and constructive suggestions. We especially appreciate the encouraging comments such as “effectively leverages sub-image region analyses and their relations to enhance interpretability”, “thorough and faithful evaluation” (R1), “meaningful integration of graph-based concept reasoning beyond existing CBMs”, “dual-scale reasoning mimics clinical thinking”, “supports clinical decision-making through intervention-friendly explanations” (R2), and “meaningful idea of modeling concept relations with GCN, with strong experimental support” (R3). All code will be available.

Response to R1: Labeling Efficiency: As noted in the last paragraph of page 7, we use dermatosis foundation model to generate uncertain pseudo-labels for robustness verification, avoiding the need for manual annotation. We appreciate your suggestion and see it as a valuable direction for future work.

Response to R2: Broaden Evaluation: Thank you for the valuable suggestion. We evaluated our method on three diverse and representative datasets to ensure robust validation. We agree that expanding comparisons further is beneficial and plan to address this in future work.

Response to R3: Q1.Concept Relationships: We believe there may be a misunderstanding. Our method explicitly models inter-concept relationships through a concept GCN, where each concept is represented as a node and edges encode latent dependencies and interactions. This design mimics how clinicians reason about concept dependencies during diagnosis, capturing both structural and global context to enable relational inference beyond independent concept recognition (Section 2.2). The construction of the concept graph and adjacency matrix reflects this relational modeling. Experimental results (Section 3.2) demonstrate that incorporating these relationships enhances diagnostic robustness and accuracy, even when some concepts are distorted, validating the practical effectiveness of our relational learning approach.

Q2. Clarity Issues: Most of the points you raised have already been addressed in the paper. ① “Adds self-loops” is a standard operation in GCNs, intended to incorporate each concept’s own features during relational modeling. ② L_{ASL} refers to the asymmetric loss, as clearly stated in the last sentence of section 2.2 on page 5, with citation [11]. ③ “True-weight-decay” and “RandAugment” are commonly used training configurations in existing literature. We will release the code with the final version to clarify training details. ④ The uncertain concept labels are generated using the dermatosis foundation model described in the last paragraph on page 7.

Q3. Activation Maps: Instead of Grad-CAM, we directly use attention weights from the Transformer’s key-query interactions between image tokens (representing sub-regions) and concept tokens (which are learned embeddings). These weights naturally reflect each image region’s contribution to a concept, enabling direct visualization of concept-specific attention maps without relying on post-hoc gradients. This provides a more faithful interpretation of the model’s focus. Thank you for the suggestion—we will clarify this in the final version.

Q4. Result Consistency: We believe there may have been a misunderstanding. The observed differences are a natural result of variation in intervention strategy configurations. Rather than contradicting prior work, our results complement and extend it by demonstrating the benefits of a more precise, targeted intervention approach. While the referenced paper uses random combinations or strategies based on uncertainty or expectation, our method focuses on misprediction-driven intervention (Page 7 lines 3-8 ), following the experimental setting of the paper titled “Concept Complement Bottleneck Model for Interpretable Medical Image Diagnosis.” This strategy is widely studied and yields results consistent with ours.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper introduces a concept-induced graph perception model to improve interpretability in medical diagnosis. Reviewers #1 and #2 both recommended acceptance, citing the paper’s clear motivation, solid empirical performance on three diverse datasets, and the added diagnostic value of its interpretable outputs. Reviewer #2 explicitly noted that while broader comparison could be improved, the rebuttal was thoughtful and the core contributions remained strong. Reviewer #3, however, raised some concerns including lack of empirical validation for concept relationships, ambiguity in methodological details, and limitations in the intervention design. Despite these criticisms, the majority of reviewers found the work promising and sufficiently impactful, particularly in terms of interpretability and practical relevance. Given the overall positive reception and clear utility of the method, the final decision is Accept.



back to top