Abstract

In recent years, various large foundation models have been proposed for image segmentation. These models are often trained on large amounts of data corresponding to general computer vision tasks. Hence, these models do not perform well on medical data. There have been some attempts in the literature to perform parameter-efficient finetuning of such foundation models for medical image segmentation. However, these approaches assume that all the parameters of the model are available for adaptation. But, in many cases, these models are released as APIs or Black-Boxes, with no or limited access to the model parameters and data. In addition, finetuning methods also require a significant amount of compute, which may not be available for the downstream task. At the same time, medical data can’t be shared with third-party agents for finetuning due to privacy reasons. To tackle these challenges, we pioneer a Black-Box adaptation technique for prompted medical image segmentation, called BAPS. BAPS has two components - (i) An Image-Prompt decoder (IP decoder) module that generates visual prompts given an image and a prompt, and (ii) A Zero Order Optimization (ZOO) Method, called SPSA-GC that is used to update the IP decoder without the need for backpropagating through the foundation model. Thus, our method does not require any knowledge about the foundation model’s weights or gradients. We test BAPS on four different modalities and show that our method can improve the original model’s performance by around 4%. The code is available at https://github.com/JayParanjape/Blackbox.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0668_paper.pdf

SharedIt Link: https://rdcu.be/dY6gc

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72390-2_43

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0668_supp.pdf

Link to the Code Repository

https://github.com/JayParanjape/Blackbox

Link to the Dataset(s)

https://github.com/DebeshJha/Kvasir-SEG https://www.endovis.org/ https://challenge.isic-archive.com/landing/2018/ https://paperswithcode.com/dataset/refuge-challenge

BibTex

@InProceedings{Par_BlackBox_MICCAI2024,
        author = { Paranjape, Jay N. and Sikder, Shameema and Vedula, S. Swaroop and Patel, Vishal M.},
        title = { { Black-Box Adaptation for Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {454 -- 464}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes a system to be able to improve the segmentation performance of a blackbox foundational model (either SAM or MedicalSAM) using a zero-order gradient optimization technique. The learned model leads to improvements in some cases.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The zero-order optimization technique seems valuable and can be effective to improve a blackbox model; it can be useful for other applications.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The method assumes that the foundation model performance can be improved by adding a learned pattern. It is not clear if that is true. In addition, the method essentially assumes a parametrized noise vector using image encoder, prompt encoder, and IP decoder would lead to the needed improvement; while such components are used in the SAM or MedicalSAM model, there is no underlying reason that the added noise vector should follow the same pattern. Equation (1) can be simplified to one parameter, rather than two parameters. Since there is a still big between the proposed method and white box LoRA, it would be informative to analyze the differences in predictions.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

Equation (1) can be simplified and will be more clear to combine c and \Delta_i into one variable.

Justifications for the choice of generating the added modification image to the SAM/MedicalSAM would be helpful.

Detailed failure modes of the proposed method compared to the white box LoRA would be valuable.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The experimental results seem convincing. The proposed method seems sufficiently novel for medical image analysis and can have more broad applications.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This paper adapts blackbox adaptation methods to segmentation tasks of medical images, thereby enabling fine tuning of foundation models without access to the model. This improves the performance of closed source models on unseen datasets. Their method is evaluated on four different datasets, showcasing the improved performance.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper is well written and easy to follow
2. I value the author’s commitment to release the code after review
3. Very interesting approach as it enables the optimization of SAM and MedSAM without any access to the model itself.
4. Good ablations, particularly those showcasing the effect of visual prompts on the images.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Novelty is limited as it essentially adapts an already available method (such as BlackVIP [1]) to work for segmentation tasks rather than classification tasks.
[1] https://openaccess.thecvf.com/content/CVPR2023/papers/Oh_BlackVIP_Black-Box_Visual_Prompting_for_Robust_Transfer_Learning_CVPR_2023_paper.pdf
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
1. In your description of the SPSA-GC algorithm you state that you add a mechanism to detect the local minima and increasing the learning rate. How much does this actually help? Do you have any ablations for this?
2. How are the p-values computed?
3. You state: “We use the MAE pre-trained ViT image encoder from Meta (ViT-MAE) [9] since it gives a stronger featurization of images over other encoders.”. Do you have a reference for this claim?
4. In Table 2 it is not clear which columns belong to which datasets. I assume it is the same as Table 1 but it would aid in understanding if you could provide this information explicitly as in Table 1.
5. Why is the DICE score for SAM (ZS) on the ISIC2018 dataset (0.66) different than that listed in Table 4 (0.67)?
6. Similarly to point 5. why is the DICE score when using the full SPSA-GC algorithm 0.74, while in Table 1 it is 0.77?
Minor points:
1. In the results section at line five, you state that you repeat the evaluation multiple times. Please be specific. How often?
2. In the line above equation (1) it says “a loss function L The estimated”. Note that the word “The” is unnecessarily capitalized.
3. In the sentence “Each element of ∆ is sampled uniformly from [−1, −0.5]U [0.5, 1].”, please use \cup if you meant to use the union symbol.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is a very interesting application of blackbox adaptation methods that could become quite important in the coming years. However, there are some inconsistencies in the results that require some explanation and the novelty of this approach is quite limited.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The paper addresses challenges faced when adapting large foundation models, typically trained on general computer vision tasks, for medical image segmentation. These models often lack access to parameters and data required for adaptation, and finetuning methods are computationally expensive and require significant compute resources, which may not be available for medical applications. Additionally, privacy concerns prevent sharing medical data with third parties for finetuning. BAPS introduces two components:

Image-Prompt Decoder (IP decoder) module: This module generates visual prompts based on input images and prompts provided. It helps in guiding the segmentation process. Zero Order Optimization (ZOO) Method - SPSA-GC: This method updates the IP decoder without needing access to the foundation model’s parameters or gradients. It eliminates the need for backpropagation through the foundation model.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors proposes two components :

Image-Prompt Decoder (IP decoder) module: This module generates visual prompts based on input images and prompts provided. It helps in guiding the segmentation process. Zero Order Optimization (ZOO) Method - SPSA-GC: This method updates the IP decoder without needing access to the foundation model’s parameters or gradients. It eliminates the need for backpropagation through the foundation model.

The authors have experimented the results on four widely used publicly available datasets for comparing with the other methods.

The authors also mentions that the total number of parameters is significantly less than the completely fine-tuned model.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The authors need to explain how exactly was the training performed. Was the black box segmentation model called using an API ? If so, what is the total running time of the model.
2. The authors needs to also analyze by comparing with transfer learning. What happens when we fine-tune SAM for our custom data.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

The paper is well written. The paper is also clinically important to the research community as the model just needs a significantly few parameters to be updated.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well written. The paper is also clinically important to the research community as the model just needs a significantly few parameters to be updated.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Author Feedback

N/A

Meta-Review

Meta-review not available, early accepted paper.

back to top

Black-Box Adaptation for Medical Image Segmentation

Author(s):