Abstract

The integration of vision-language models such as CLIP and Concept Bottleneck Models (CBMs) offers a promising approach to explaining deep neural network (DNN) decisions using concepts understandable by humans, addressing the black-box concern of DNNs. While CLIP provides both explainability and zero-shot classification capability, its pre-training on generic image and text data may limit its classification accuracy and applicability to medical image diagnostic tasks, creating a transfer learning problem. To maintain explainability and address transfer learning needs, CBM methods commonly design post-processing modules after the bottleneck module. However, this way has been ineffective. This paper takes an unconventional approach by re-examining the CBM framework through the lens of its geometrical representation as a simple linear classification system. The analysis uncovers that post-CBM fine-tuning modules merely rescale and shift the classification outcome of the system, failing to fully leverage the system’s learning potential. We introduce an adaptive module strategically positioned between CLIP and CBM to bridge the gap between source and downstream domains. This simple yet effective approach enhances classification performance while preserving the explainability afforded by the framework. Our work offers a comprehensive solution that encompasses the entire process, from concept discovery to model training, providing a holistic recipe for leveraging the strengths of GPT, CLIP, and CBM.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3895_paper.pdf

SharedIt Link: https://rdcu.be/dV53I

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72117-5_4

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3895_supp.pdf

Link to the Code Repository

https://github.com/AIML-MED/AdaCBM

Link to the Dataset(s)

HAM10000; BCCD; Diabetics retinopathy

BibTex

@InProceedings{Cho_AdaCBM_MICCAI2024,
        author = { Chowdhury, Townim F. and Phan, Vu Minh Hieu and Liao, Kewen and To, Minh-Son and Xie, Yutong and van den Hengel, Anton and Verjans, Johan W. and Liao, Zhibin},
        title = { { AdaCBM: An Adaptive Concept Bottleneck Model for Explainable and Accurate Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15010},
        month = {October},
        page = {35 -- 45}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents an adaptive concept bottleneck model aiming to perform explainable diagnosis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The paper is easy to read. 2) The topic of this paper is interesting with an impact on the scientific community. 3) Authors utilize publicly available datasets for their experiments.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) Lack of novelty 2) Limited number of experiments 3) Lack of comparisons- comparisons with adequate number of SOTA approaches, as well as more complex architectures have not been included in the paper. 4) The paper is proposed for Explainable and Accurate Diagnosis. However, it is cot clearly stated what are the advantages/ contributions of the presented method compared to the existing explainable methods. It is not clear to me how this approach is explainable and the results from the proposed method are not easily understandable.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No paper code is available. Authors have not provided any pseudocode and adequate number of equations. The paper is not reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) Please do grammar and syntax checks to the text 2) Authors need to follow the instructions given in the conference template regarding the format of their paper. 3) There is lack of novelty. Although the topic of this paper is interesting, it is not clear to me what is the novelty of the proposed method compared to the existing ones. Authors need to further clarify. 4) Limited number of comparisons. Authors need to evaluate the proposed methods with more SOA approaches, including both interpretable and non interpretable. 5) Have the authors considered other type of similarity except cosine similarity? 6) In figure 2, what is confidence refered to? Please specify. 7) In Fig. 3 (b) AdaCBM does not seem so stable. Specifically, before 300 it seems not to have stabilized. Where is this due to and how could it be improved? It would be useful to state in detail how many iterations are needed for the model to converge and why this happens. 8) The numbers in the images are very small in size. It would be helpful to the reader if the letters in the images were larger 9) How can the reader understand the explainability effectiveness of the proposed model? What advantages does the proposed model have in terms of its explainability, in relation to the already existing methodologies? This field of explainable/interpretable ML has developed a lot and it is not clear to me what more the specific methodology has in relation to other methods, eg fuzzy logic, where they have better results and are better understandable.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    1) Lack of novelty 2) Lack of clarity-More information regarding the explainability of the proposed method is needed 3) Limited number of comparisons

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    After feedback from the authors, the novelty and contribution of this research became better understood and clear to me. The authors report that the research code will be published on GitHub, making the paper reproducible. Furthermore, the clarifications marked to be added as well as the suggestions of the reviewers should be considered by the authors, with the aim of further improving their research.



Review #2

  • Please describe the contribution of the paper

    This works proposes and adaptive model for explainable and accurate diagnosis using i) a module between CLIP and CBM to bridge the gap between source and downstream domains, ii) prompt engineering- based concept generation strategy utilizing GPT-4 and iii) concept selection criteria that maximize concept utility towards the downstream medical diagnostic task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Important research question to bridge the gap between image and text embeddings.
    • Examining the geometrical representation of CBM and and treating it as linear classification systems
    • Thorough experimental design and ablations using publicly available datasets with different diseases and imaging modalities.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • I know that the space is limited, but I feel like some parts are very rushed, where some parts are repetitive. I think a reiteration on that would improve the paper.
    • For me one part is not clear and it would be good to have a clarification on that: So x is the image representation, which is the embedding one gets from a retrained network. Vector t is the text embedding, which is essentially theconcept vector. The goal is to make these representations closer using the adaptive module. More specifically bringing the image representation x closes to concept vector t. If this is correct, it is better to state this a bit more openly.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors use publicly available datasets and would be great for reproducibility if they can also make their code available. No comments on it from the authors so far.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • I like Fig 1 and also the subsection where authors explain how CBMs are trained. However, it makes it hard to have the adaptable module while you explain CBM. Is there a way to mention adaptable module and green arrows somehow differently to make it more clear and digestible?
    • p.6 Can you give a summary of doctor labelled concepts for each of the dataset? Are they overlapping with GPT generated concepts?
    • I find the ablation study Table 2-2 quite interesting, that shows changing text embeddings is not ideal and leads to lower classification performance. Do the authors have an interpretation for this? It would be nice to add a sentence about that.
    • Since the experiments are run for 5 seeds, it would be good to add std for each experiment to see the variability.

    
Minor:

    • Move Fig. 2 to right after it is mentioned. Otherwise it makes hard to follow.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Well written paper with well studied experiments. The authors tackle an important problem for CBMs and propose an adaptive idea to bring image embeddings and concept prediction closer.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper presents AdaCBM, an adaptive concept bottleneck model designed to enhance the interpretability and accuracy of diagnosis in medical imaging tasks. It integrates a learnable adapter module between the CLIP and CBM frameworks to address domain-specific challenges and enhance the model’s classification performance. The proposed model maintains the explainability of CBMs while improving their adaptability through geometrical representation and strategic module placement, offering a comprehensive approach from concept discovery to model training.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper combines CLIP’s pre-trained capabilities with CBM’s interpretability, addressing the challenge of applying general-domain models to specific medical tasks through an innovative adapter module.
    2. AdaCBM preserves the interpretability of concept bottleneck models while improving classification accuracy, which is a notable achievement in explainable AI since it often performs a tradeoff.
    3. The model is evaluated across multiple medical imaging datasets, demonstrating its effectiveness and robustness. The use of geometrical representation to explain the model’s behavior improves clarity and depth to the evaluation.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The geometrical representation part utilize the concepts to perform simple linear classification system. Have you compared with when you tried to use a simple MLP design, since it can also represent the formulation provided if it is trained properly? A easier design might achieve similar results.
    2. An investigation on out-of-domain data would be stronger to show the efficiency of the proposed method. The three datasets might have been trained in the GPT models. Also, would you try some domain specific LVLMs like Med-LLaVA? These models might improve the performance further.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See weakness.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper present an interesting application in medical imaging tasks which uses an adaptive concept bottleneck model to enhance the interpretability and accuracy of diagnosis, although there still needs some supports on the reason for the design.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the reviewers for their insightful comments. We are encouraged by the overall positive comments, e.g., important/interesting problem to medical AI community (R1/3/4), well written/excellent clarity (R1/4), thorough design/experiments/ablation (R1/4). There are also some conflicting reviews from R3, for which we will focus on clarifications. Our code will be published on GitHub.

R1 [adaptive module’s function] R1 is correct. The adaptive module brings an image representation x closer to a concept vector t if the image contains the concept, otherwise x is pushed toward -t. We will highlight this.

[doctor labelled concepts] Doctor/GPT (k=10) concepts are worded differently but have close semantic meanings, i.e., we found they have 0.66 to 0.72 mean cosine similarity for HAM, BCCD and DR datasets.

[text embedding ablation insight] The datasets do not contain text to meaningfully train the text embedding module, so it is not sensible to train text embedding to adapt the image characteristics. This is validated in Table 2-2.

R3 [novelty clarification] Our novelty lies in the re-examination of CBM as a linear classification system (LCS). This LCS characteristic is neglected in the CBM literature which instead focused on post-CBM modification using cosine similarity. Our work therefore is novel and important in demonstrating that the pre-CBM adaptation is the key to improving model classification performance while keeping explainability.

[lack of experiments/comparison, compare to fuzzy logic] We extensively compared with 6 non-interpretable/interpretable methods on 3 datasets and demonstrated 6 different ablations. Please see fine-grained comparison to [14, 23] (SOTAs, 2023) on # of concepts per class, concept selection, and CBM module in Suppl. Table 1 (Right). We compare to complex and SOTA Transformer (ViT, BioMedCLIP, etc) and ResNet architectures in Suppl. Table 2-3. Fuzzy logic systems implement rule-based logic handcrafted by the experts to encode human reasoning whereas CBMs are learning-based systems that endow semantic meanings to neural network decision logic. The different objectives make them incomparable.

[advantage to existing explainable methods clarification] AdaCBM’s advantage is its superior performance that closes the performance gap of CBMs to non-explainable classifiers. Whereas, for CBMs, lower classification performance is a significant issue as it inevitably harms user trust even if the model is explainable.

[AdaCBM interpretation and usage clarification] AdaCBM explainability is achieved by the quantifiable (and hence rankable) concept contributions toward the model outcome (see Fig. 2). Users can judge and intervene model decision by removing inappropriate text concepts.

[other similarities] CBM explainability relies on CLIP’s vision and language pretraining. Using another similarity needs to retrain CLIP but CLIP’s training datasets are not publicly available and out of the scope of our study.

[Fig.3-b AdaCBM not stable] R3 might have mistaken LaBo (blue lines) for AdaCBM (orange). The instability of LaBo is a validation frequency artifact (400 probes in 10K epochs vs AdaCBM’s 12 probes in 300 epochs).

[other issues] Paper format: we strictly used the MICCAI tex template without any change. Confidence = class probability.

R4 [replace linear classification system (LCS) with MLP] An MLP extending above the LCS will make decision logic nonlinear and hence not interpretable. One may extend MLP layers below the LCS, which is equivalent to our #-of-layer ablation in Table 2-3.

[out-of-domain evaluation] We only fed the class labels to GPT-3/4 and CLIP’s training data do not contain the evaluated datasets, hence our results are out-of-domain evaluation.

[domain specific LVLMs (Med-LLaVA)] We are unable to report on Med-LLaVA due to the rebuttal policy but will include in future work. Please see Suppl. Table 2-3, BioMedCLIP is Med-LLaVA’s image backbone, but it performs lower than CLIP.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Three reviewers agree that the paper has value for MICCAI. I would also suggest the acceptance of the paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Three reviewers agree that the paper has value for MICCAI. I would also suggest the acceptance of the paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top