Abstract

Due to the high stakes in medical decision-making, there is a compelling demand for interpretable deep learning methods in medical image analysis. Concept Bottleneck Models (CBM) have emerged as an active interpretable framework incorporating human-interpretable concepts into decision-making. However, their concept predictions may lack reliability when applied to clinical diagnosis, impeding concept explanations’ quality. To address this, we propose an evidential concept embedding model (evi-CEM), which employs evidential learning to model the concept uncertainty. Additionally, we offer to leverage the concept uncertainty to rectify concept misalignments that arise when training CBMs using vision-language models without complete concept supervision. With the proposed methods, we can enhance concept explanations’ reliability for both supervised and label-efficient settings. Furthermore, we introduce concept uncertainty for effective test-time intervention. Our evaluation demonstrates that evi-CEM achieves superior performance in terms of concept prediction, and the proposed concept rectification effectively mitigates concept misalignments for label-efficient training. Our code is available at https://github.com/obiyoag/evi-CEM.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2142_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2142_supp.pdf

Link to the Code Repository

https://github.com/obiyoag/evi-CEM

Link to the Dataset(s)

https://github.com/mattgroh/fitzpatrick17k https://skincon-dataset.github.io

BibTex

@InProceedings{Gao_Evidential_MICCAI2024,
        author = { Gao, Yibo and Gao, Zheyao and Gao, Xin and Liu, Yuanye and Wang, Bomin and Zhuang, Xiahai},
        title = { { Evidential Concept Embedding Models: Towards Reliable Concept Explanations for Skin Disease Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15010},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The proposed paper focuses on the problem of improving concept prediction for interpretable skin disease diagnosis. Specifically, the proposed pipeline builds upon Concept Bottleneck Models (CBMs) which is an interpretable framework incorporating human-interpretable concepts into decision-making but may have unreliable concept predictions, and propose evidential-CEM (evi-CEM) based on evidential deep learning. Evi-CEM calibrates concept predictions, quantifies uncertainty, rectifies concept misalignments, and introduces uncertainty-aware intervention. The authors provide eperimental results on a skin disease dataset to evaluate the capability of their proposed pipeline in providing concept explanations for diagnosis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Given the evaluation results presented in the paper, the proposed pipeline does improve the concept prediction reliability over existing methods, providing a feasible solution to the proposed problem.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The writing of the paper can be improved, overall the paper is hard for the readers to read and grab the key points/claims the authors would like to convey.
    2. Technical Novelty. The proposed method seems heavily replies on existing work, CBMs, with the concept rectification based on VLM pretraining (which has been widely studied and adopted in the computer vision and medical image analysis community). The technical contributions thus seem to be limited.
    3. Evaluations. In Table 1, the authors compare their proposed evi-CEM with other CBM variants, which do not pretrained with VLMs (I assume). However, the proposed evi-CEMs adopted VLMs that have been pretrained on large-scale domain-relevant/-irrelevant data, which naturally introduce additional prior data knowledge, resulting on unfair comparison with existing methods (do not reply on VLM pretraining).
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    I don’t see any concerns here.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please refer to Section 5 and 6 for further details.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My major concern of this paper is the clarity and organization, overall it is hard to read and follow. On the other hand, the technical novelty and fairness of the experimental evaluations and comparisons to prior art seem to be not very convincing.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposes an evidential concept embedding model (evi-CEM) to enhance the reliability of concept predictions in Concept Bottleneck Models (CBM) for medical image analysis. The model employs evidential learning to model concept uncertainty and rectifies misalignments that arise when training CBMs without complete concept supervision. The results demonstrate that evi-CEM and rectified evi-CEM achieves superior performance in concept prediction and effectively mitigates concept misalignments for label-efficient training.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    • The paper is well-structured, with a clear articulation of its motivation. The exploration of concept bottleneck models and concept realignment within the medical domain is both timely and significant. • The introduction of uncertainty-aware intervention by the authors is particularly logical and the results demonstrate the effectiveness of the model in addressing concept misalignments and improving concept prediction performance.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the paper presents a solid foundation and promising results, paper focuses on a specific dataset and does not explore the scalability of the model to other medical imaging datasets. In later versions of the paper, I suggest author to assess their model’s performance in more diverse clinical contexts to ensure its robustness and generalizability.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In later versions of the paper, I suggest author to assess their model’s performance in more diverse clinical contexts to ensure its robustness and generalizability.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper explores the use of Concept Bottleneck Models (CBMs) in the medical imaging domain, which is a significant advancement beyond their previous applications in natural images. The authors highlight the importance of enhancing the interpretability of deep learning models and the ability to effectively intervene in these networks to solve complex problems, a common scenario in the medical field. The paper delves into a comprehensive investigation of how these interventions affect the overall performance of the models.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I had no major queries. I acknowledge that authors would like to work on this problem and extend it in future. Looking forward to it. I would like to keep my rating to Accept.



Review #3

  • Please describe the contribution of the paper

    The authors propose a concept-based model, named evi-CEM, which relies on evidential deep learning theories to mitigate the overconfidence of predictions. The presented architecture develops an innovative approach to quantify uncertainty in concept embedding models for each concept independently. They evaluate their approach on a skin disease dataset and effectively demonstrate encouraging results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Overall, the paper is very well-written and clearly explains the ideas and concepts behind their approach. The idea seems very promising and original, addressing an important problem in concept-based models. The results are also well-presented and compared to other state-of-the-art methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I don’t see much weaknesses.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper already looks very interesting and is developed in a very good manner.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Again, I suggest accepting the paper due to its novelty and effective presentation of their work.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    My opinion on the paper remains unchanged, and I found the authors’ responses convincing. It’s still an ‘accept’ for me.




Author Feedback

Dear Reviewers and Area Chairs, We would like to express our gratitude to three reviewers (R1, R3, R4) for their careful reviews and constructive feedback. We are pleased that the reviewers acknowledged the paper’s novelty (R4), writing (R3, R4), and effectiveness (R1). Below, we have summarized the main concerns raised and provided our corresponding responses:

  • Regarding the novelty and contribution: Q: R1 stated that the paper heavily relies on existing works, resulting in limited contribution. R: We would like to emphasize that our paper focuses on rectifying the concept misalignments originated from VLMs, which have been ignored in previous works [1, 2, 3]. While it is true that training CBMs with VLMs has been explored, the reliability of the explanations is hindered by the concept misalignments due to the training paradigm of VLMs [4, 5]. We propose to utilize evidential learning to model concept uncertainty, which helps rectify concept misalignments for label-efficient training. Furthermore, we introduce uncertainty-aware intervention, which could enhance the diagnosis performance with less interventions. In summary, our method improves the reliability of concept explanations, whether trained with concept supervision or VLMs. This contribution is particularly significant in the medical domain.

  • Regarding the experiments: Q: R1 pointed out that the paper compares evi-CEM with other CBM variants that are not pretrained with VLMs, resulting in an unfair comparison. R: We clarify that in Table 1, evi-CEM does not involve VLMs and the comparison is fair, which seems to be misunderstood by R1. The results presented in Table 1 compare the models under complete concept supervision, indicating that all the models are trained solely with concept labels and do not incorporate VLMs. To prevent any further misunderstandings, we will explicitly emphasize this point in the revised version. Q: R3 suggested evaluating the model’s performance in more diverse clinical contexts. R: We appreciate the constructive suggestion provided by R3. Due to the space limitations and the rebuttal policy, we are unable to include additional results in this version of the paper. However, we acknowledge the importance of assessing our model in diverse clinical contexts, and we will consider incorporating more datasets and contexts in our future work.

  • Regarding the writing: Q: R1 mentioned that the paper is challenging to follow and grab the key points. R: We appreciate R1’s comment, and we will address this concern by reorganizing the introduction section of our paper. Additionally, we will enhance clarity by emphasizing the contributions with bullet points.

References: [1] Yang et al., Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification, CVPR 2023. [2] Oikarinen et al., Label-Free Concept Bottleneck Models, ICLR 2023. [3] Yan et al., Learning Concise and Descriptive Attributes for Visual Recognition, ICCV 2023. [4] Yuksekgonul et al., Post-hoc Concept Bottleneck Models, ICLR 2023. [5] Yun et al., Do Vision-Language Pretrained Models Learn Composable Primitive Concepts, TMLR 2023.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have successfully addressed the concerns of reviewers. Overall, the paper is well-written and well-motivated. The introduction of uncertainty-aware intervention looks well-grounded. Experiments demonstrate the effectiveness of the methods by addressing concept misalignments. The paper has clear merits.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors have successfully addressed the concerns of reviewers. Overall, the paper is well-written and well-motivated. The introduction of uncertainty-aware intervention looks well-grounded. Experiments demonstrate the effectiveness of the methods by addressing concept misalignments. The paper has clear merits.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper the model evi-CEM based on evidential learning that prevents overconfident predictions. The paper received (reject -> no reassessment, accept-> accept, accept -> accept) scores (before->after rebuttal). Reviews suggest that the paper is well-written and addresses a relevant problem. The main issues identified are missing experiments in other medical imaging datasets and limited technical contributions. The paper has pros and cons, but the major remaining issue regarding the limited novelty has been addressed in the rebuttal. I am leaning towards acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper the model evi-CEM based on evidential learning that prevents overconfident predictions. The paper received (reject -> no reassessment, accept-> accept, accept -> accept) scores (before->after rebuttal). Reviews suggest that the paper is well-written and addresses a relevant problem. The main issues identified are missing experiments in other medical imaging datasets and limited technical contributions. The paper has pros and cons, but the major remaining issue regarding the limited novelty has been addressed in the rebuttal. I am leaning towards acceptance.



back to top