Abstract

Medical foundation models are gaining prominence in the medical community for their ability to derive general representations from extensive collections of medical image-text pairs. Recent research indicates that these models are susceptible to backdoor attacks, which allow them to classify clean images accurately but fail when specific triggers are introduced. However, traditional backdoor attacks necessitate a considerable amount of additional data to maliciously pre-train a model. This requirement is often impractical in medical imaging applications due to the usual scarcity of data. Inspired by the latest developments in learnable prompts, this work introduces a method to embed a backdoor into the medical foundation model during the prompt learning phase. By incorporating learnable prompts within the text encoder and introducing imperceptible learnable noise trigger to the input images, we exploit the full capabilities of the medical foundation models (Med-FM). Our method requires only a minimal subset of data to adjust the text prompts for downstream tasks, enabling the creation of an effective backdoor attack. Through extensive experiments with four medical foundation models, each pre-trained on different modalities and evaluated across six downstream datasets, we demonstrate the efficacy of our approach. Code is available at https://github.com/asif-hanif/baple

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3117_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3117_supp.pdf

Link to the Code Repository

https://github.com/asif-hanif/baple

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Han_BAPLe_MICCAI2024,
        author = { Hanif, Asif and Shamshad, Fahad and Awais, Muhammad and Naseer, Muzammal and Shahbaz Khan, Fahad and Nandakumar, Karthik and Khan, Salman and Anwer, Rao Muhammad},
        title = { { BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposed a method for backdoor attack of medical foundation model during the prompt learning phase. The method termed BAPLe involves adding imperceptible learnable noise trigger in the input images and incorporating learnable prompts within the text encoder. Extensive experiments demonstrates the effectiveness of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-written and easy to follow.
    2. The paper explores the backdoor attack for medical foundation models in the prompt learning phase, which is a practical setting since the training of the foundation model is expensive. The paper also considers the limited labeled samples setting which is common in medical imaging.
    3. The paper performs extensive experiments on various datasets and foundation models, which demonstrates promising results.
    4. The paper also conducts extensive ablation studies that confirms the design choices of the proposed method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. My main concern about this paper what makes the attack so successful compared to the baselines. The proposed method is somewhat simple; it has two parts: patch + noise on the image input and learnable text prompt in the text input. It seems the main difference compared to the baselines is the text prompt, but there is no experiment showing the effectiveness of this component.
    2. The paper should also compare to more recent state-of-the-art poisoning attacks, such as sleeper agent (https://arxiv.org/abs/2106.08970).
    3. What are the patch triggers used in this paper?

    Grammar error: Page 6 second to the last line “due the large network over-fitted to”

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See the weakness section.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    What I do see the merits of this paper, I have some concerns about this paper listed in the weakness section.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper introduces BAPLe, a method to embed backdoors into medical foundational models during prompt learning. This is achieved by adding learnable prompts and imperceptible noise triggers to the input, requiring minimal data. It highlights the susceptibility of medical AI to adversarial attacks, particularly backdoor attacks, which can manipulate model behavior with specific triggers while maintaining accuracy on clean data. BAPLe demonstrates high efficiency and efficacy in embedding backdoors, as shown through experiments across various medical models and datasets, achieving over 90% success rate with minimal parameter modification.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Interesting topic
    • Clear description of the method
    • Well-structured and easy-to-follow
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • lack of detailed explanation
    • more experiments
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper introduces BAPLe, a method to embed backdoors into medical foundational models during prompt learning. This is achieved by adding learnable prompts and imperceptible noise triggers to the input, requiring minimal data. It highlights the susceptibility of medical AI to adversarial attacks, particularly backdoor attacks, which can manipulate model behavior with specific triggers while maintaining accuracy on clean data. BAPLe demonstrates high efficiency and efficacy in embedding backdoors, as shown through experiments across various medical models and datasets, achieving over 90% success rate with minimal parameter modification.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I appreciatet the author could provide a theat model section, which clearly present the attack scenario, attacker’s goal and ability. Besides, the authors also conduct extensive experiments to study various factors.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The rebuttal address some of my concern, after reading others reviews, I decide to keep my score.



Review #3

  • Please describe the contribution of the paper

    The authors have developed a backdoor attack method for Medical foundation models using prompt learning technique. The method involves corrupting a small set of data with trigger noise and wrong target label, and then fine tuning prompts and trigger noise in such a way that the new model correctly classifies clean data but the image with added trigger noise are misclassified into attacker specified label. The author have tested their method across six dataset and four foundation model to show the effectiveness of the attack.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method demonstrates a novel finding that prompting tuning strategy is vulnerable in medical VLM against the backdoor attacks, with image classification tasks.

    This approach can be trained with few-shot examples and does not necessitate updating the entire foundation model’s parameters, making it efficient in terms of both training time and computational resources.

    The approach is six different datasets and four VLM models, showing improved performance in both clean accuracy and backdoor accuracy.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    One potential weakness of the paper is the challenge of integrating trigger noise, patch and learned prompts in downstream tasks without arousing suspicion. The integration of learnable parameters in texts needs to be handled in a way that does not appear conspicuous, ensuring that the alterations remain discreet while still effective in demonstrating the vulnerability to backdoor attacks.

    The author have presented a scenario of potential backdoor attack, but the clinical scenario is not well justified.

    While the paper successfully demonstrates the vulnerability of medical foundational models to backdoor attacks, it does not address if it can bypass the available backdoor defense system?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) I suggest also using the evaluation metric Attack Success Rate (ASR) [1], which measures the percentage of misclassified poisoned test samples. 2) In the experiments, two approaches have been demonstrated: fine-tuning and prompt-tuning for back door attack. Ideally, fine-tuning the whole model is not suitable for a few-shots learning approach. Why did the author opt for this strategy ? The analysis on the computational resource should bolster this paper. 3) How many prompt tokens were used? How did you set this hyperparameter? 4) It is not well justified why both patch logo and noise are used for the backdoor attack. 5) The number of training epochs used for few-shots is not mentioned. 6) I suggest on implementing the backdoor attacks not only for image classification downstreaming tasks but also for tasks with image-text pairs as input. References:

    [1] Wang et al., 2019

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper includes a novel application of learnable prompts to implement backdoor attacks in medical foundational models. This approach exposes vulnerabilities in foundational models in the medical domain. Such insights are crucial for the medical community as they encourage proactive measures to address and mitigate these vulnerabilities.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thank you for justifying some concerns. I am keeping my score the same.




Author Feedback

We thank all reviewers (R1, R3, R4) for their positive feedback: well-structured paper and extensive experiments (R1, R3), practical setting (R3), and novel and efficient approach (R4). Our code will be publicly released.

R1 Detailed Explanation (Q1): We thank R1 for suggestion. We will further improve the clarity of our BAPLe algorithm in the Suppl. material.
More Experiments (Q2): We thank the reviewer for acknowledging the extensive experiments in recommendation section. There are a few additional experimental visualizations in the Suppl. material that the reviewer may find insightful.

R3 On Effectiveness of BAPLE over Baselines (Q1): Our threat model, as discussed in the paper, assumes that the attacker has access to only a few-shot dataset. Baselines that fine-tune the entire model lead to overfitting and suboptimal performance with limited downstream training data. In contrast, BAPLe precisely tailors both the vision and text input spaces, enabling the frozen medical VLM to leverage its rich knowledge and achieve high attack success rates without compromising clean performance. While the text prompt is the main difference between BAPLe and the fine-tuning-based baselines, our primary objective is to demonstrate that the widely adopted prompt-tuning stage, where only the text prompt is updated, is vulnerable to backdoor attacks even with low data requirements (3rd para introduction). We believe that BAPLe’s simplicity in achieving high attack success rates emphasizes the vulnerability of the prompt-tuning stage. Sleeper Agent Comparison (Q2): Sleeper Agent focuses on backdooring networks trained from scratch, which may not be feasible for large Med-VLMs. On the other hand, our goal is to backdoor the prompt-learning stage of pretrained Med-VLMs in a few-shot setting. Patch Trigger (Q3): BAPLe uses natural-looking medical-related patch triggers (logo/text) commonly found in medical images (Fig. 1) instead of random patches.

R4 On Learnable Text (Q1): The learnable text prompts are optimized in the embedding space during prompt tuning and not to be provided by end-user during inference after the model deployment. User interacts with the infected model in the same way as the original clean model. Clinical Scenario (Q2): In the case of real-world deployment of an open-sourced vision-language medical model, malicious actors can release an infected model version that would operate seamlessly with clean images but generate predetermined responses when presented with poisoned images. Our threat model, discussed in Sec. 3.1, highlights such attacks and strives to promote the safe adoption of such Med-VLMs before their deployment. Defense (Q3): To our knowledge, we are the first to explore vulnerability of medical vision language foundation models towards backdoor attacks during prompt learning stage. We agree with R4 that defense for such attacks is an interesting future direction. ASR (Q4): We apologize for the confusion. We have used the term “Backdoor Accuracy (BA)” which is an alternative term for ASR in context of backdoor attacks. Finetuning (Q5): Recent backdoor attacks generally fine-tune the full model weights. We show that this setup is not effective in the case of our few-shot scenario (see Tab. 1), acknowledging the reviewer’s point. Importantly, we also compare BAPLe with a prompt-tuning-based baseline by applying baseline methods to the Med-VLM’s image space. BAPLe outperforms this baseline (Tab. 1), highlighting its effectiveness in few-shot scenarios. Computational resource analysis is in the introduction’s last line. Prompt Tokens (Q6): We used 16 prompt tokens, which we empirically found to balance performance and computational load. Patch and Noise (Q7): The combination of patch and noise synergistically enhances the attack’s performance, as demonstrated in Tab. 3(c). Epochs (Q8): We set the number of epochs to 50. Image-Text Pairs (Q8): Thank you for the suggestion. We will explore this in future work.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I recommend rejection of this paper on the basis of its low potential impact. Adversarial attacks are no doubt of critical importnace in certain fields. But they are not a common concern in medical imaging. This work does not propose a method for defending against attacks, but a new way of devising such attack. Examples that the authors provide (Fig. 1) are I think good example to convice one that this is not a pressing need for our field. I do not imagine a common situation where devising such attacks are needed or practically useful.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I recommend rejection of this paper on the basis of its low potential impact. Adversarial attacks are no doubt of critical importnace in certain fields. But they are not a common concern in medical imaging. This work does not propose a method for defending against attacks, but a new way of devising such attack. Examples that the authors provide (Fig. 1) are I think good example to convice one that this is not a pressing need for our field. I do not imagine a common situation where devising such attacks are needed or practically useful.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors present a new backdoor attack technique for Medical foundation models, leveraging prompt learning. This technique involves selectively corrupting a subset of data with trigger noise and incorrect target labels. Subsequent fine-tuning of prompts and trigger noise ensures that while the model accurately classifies unaltered data, it misclassifies images containing the trigger noise into categories specified by the attacker. The effectiveness of this method has been demonstrated across six datasets and four foundation models, illustrating its robustness in inducing targeted misclassifications. The paper has received marginal scores but the reviewer acknowledges that the authors addressed their concerns in the rebuttal.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors present a new backdoor attack technique for Medical foundation models, leveraging prompt learning. This technique involves selectively corrupting a subset of data with trigger noise and incorrect target labels. Subsequent fine-tuning of prompts and trigger noise ensures that while the model accurately classifies unaltered data, it misclassifies images containing the trigger noise into categories specified by the attacker. The effectiveness of this method has been demonstrated across six datasets and four foundation models, illustrating its robustness in inducing targeted misclassifications. The paper has received marginal scores but the reviewer acknowledges that the authors addressed their concerns in the rebuttal.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top