Abstract

Medical image segmentation has been traditionally approached by training or fine-tuning the entire model to cater to any new modality or dataset. However, this approach often requires tuning a large number of parameters during training. With the introduction of the Segment Anything Model (SAM) for prompted segmentation of natural images, many efforts have been made towards adapting it efficiently for medical imaging, thus reducing the training time and resources. However, these methods still require expert annotations for every image in the form of point prompts or bounding box prompts during training and inference, making it tedious to employ them in practice. In this paper, we propose an adaptation technique, called S-SAM, that only trains parameters equal to 0.4% of SAM’s parameters and at the same time uses simply the label names as prompts for producing precise masks. This not only makes tuning SAM more efficient than the existing adaptation methods but also removes the burden of providing expert prompts. We call this modified version S-SAM and evaluate it on five different modalities including endoscopic images, x-ray, ultrasound, CT, and histology images. Our experiments show that S-SAM outperforms state-of-the-art methods as well as existing SAM adaptation methods while tuning a significantly less number of parameters. We release the code for S-SAM at https://github.com/JayParanjape/SVDSAM.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0669_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0669_supp.pdf

Link to the Code Repository

https://github.com/JayParanjape/SVDSAM

Link to the Dataset(s)

ignaciorlando.github.io https://paperswithcode.com/paper/cholecseg8k-a-semantic-segmentation-dataset https://ieeexplore.ieee.org/document/9395510 https://www.kaggle.com/datasets/andrewmvd/lits-png https://www.sciencedirect.com/science/article/abs/pii/S1361841516301542

BibTex

@InProceedings{Par_SSAM_MICCAI2024,
        author = { Paranjape, Jay N. and Sikder, Shameema and Vedula, S. Swaroop and Patel, Vishal M.},
        title = { { S-SAM: SVD-based Fine-Tuning of Segment Anything Model for Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The proposes an SVD based fine-tuning approach which tunes the singular values of the weight matrices to show promising results with tuning SAM. The approach additionally doesnt need any prompts other than the label making it easier to use.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper presents a simple idea of tuning the weight matrices by transforming the singular values. The approach adds minimal parameters compared to other baselines and shows good performance across different segmentation benchmarks The ablations around tuning of different components are useful and highlight their relative importance

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The SVD based approach for tuning singular values is originally discussed in the paper https://arxiv.org/pdf/2206.06122. The only difference being the current approach proposes an affine transform over directly tuning the singular values.
    • The text-transform layer and learning using label prompts is similar to AdaptiveSAM
    • The comparison with LoRA is useful and appreciated however its unclear what rank r was chosen for the experiments which impacts the results.
    • There could be more discussion around results, its unclear why other methods with more flexible tunable params like Lora, AdaptiveSAM perform poorly on datasets like Glas and on some classes in ChoecSeg8k.
    • More discussion around why the method works well without decoder tuning and when it may be needed could be useful
    • Any discussion around when the approach might be limiting would be useful
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper is well written with a simple formulation which is well evaluated against existing baselines. The formulation however is already explored and the paper simply applies it to the SAM backbone. In such a case more discussion around when the method may or may not be effective is needed.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The approach presented to fine-tuning SAM is useful and paper provides good comparisons with baseline around its efficiency. However the approach is not novel and its primarily application of the Singular value fine-tuning (SVF) to the SAM backbone. In that regard, the paper needs more discussion around the method’s effectiveness and limitations.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors propose a modification method named S-SAM. This technique utilizes the names of the labels as propmpts to generate accurate masks and only requires training for 0.4% of SAM’s parameters.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.The author’s work has significant practical value in clinical settings.

    1. The author validated the effectiveness of the model using multiple datasets.
    2. Compared to other SAM-based models, the author’s model is more efficient.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Some details in the paper are not clearly explained, which could potentially confuse the readers.
    2. The writing of the paper needs improvement. For instance, the authors need to provide a more detailed introduction to the motivation behind the proposed method in the methodology section.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors have stated that they will release the code in the future.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. In the experimental section, the authors did not provide a detailed description of the SAM-based methods. For example, what kind of prompts were used in “SAM with full fine-tuning”? If point prompts were used, how many points were there? Why were box prompts not used? From Figure 3, it appears that “SAM w/ point prompt” refers to the automatic segmentation mode applied by SAM, rather than manually providing point prompts. Could the authors please clarify this?

    2. Based on the previous comment, if the authors did not provide a comparison with SAM-based methods that manually provide point prompts in the experimental section, then the experiment is not comprehensive enough. The authors introduced the advantage of the proposed method at the beginning of the article, which is that it can complete segmentation without expert-level annotation. Therefore, the authors need to compare the proposed method with SAM models that have expert-level annotation.

    3. Currently, many tasks are employing the structure of an adapter to allow large models to quickly adapt to downstream tasks. In this paper, the author compares their method with the Medical SAM adapter. However, the adapter structure is not utilized in their network. Instead, the parameters of normalization are updated during the training phase to achieve model adaptation. What considerations led to this approach?

    If the authors can address my concerns satisfactorily, I would be willing to give a higher score.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty of the method, the writing of the paper, and the experiments.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The novelty, writing, experiments and the feedback from authors.



Review #3

  • Please describe the contribution of the paper

    The authors propose a method S-SAM to adapt the SAM model to medical images. S-SAM expects an image and a class name prompt as inputs, and the output is a mask. The singular values of the weight matrices in the image encoder are tuned with a nonlinear operation (scale, shift & Relu). The SAM prompt encoder is adapted by using a trainable MLP (TAL). The SAM decoder is used without any modifications and tuning. SAM’s positional embeddings are replaced with learnable embeddings for training with smaller image sizes. Loss function used is sum of dice and focal losses. The number of trainable parameters in S-SAM is 2min(D, K), whereas in LoRA it is higher at r(D+K). The method is evaluated on 5 datasets of different modalities, and qualitative and quantitative (Avg dice score) analysis are provided. The authors show that their method outperforms or performs on par (with fewer parameters) compared to the other methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is straightforward, well written and easy to understand. It evaluates the proposed method on datasets of various modalities and shows improved quantitative results while reducing the number of parameters that are trained. Qualitative analysis is also done. Ablation is done in Table 3 for the various components of S-SAM.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -The authors use just a single metric (DSC) for quantitative analysis. -The discussion on performance is done by looking at just the average of the DSC for individual labels.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The initialization of the different parts/parameters of the model are provided. More details of the experiments and the training procedure are provided in the supplementary. These should help reproduce the numbers achieved in the tables.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors make a set of simple modifications to SAM and show improvements on the LORA based SAM finetuning. Quantitative results are provided on 5 datasets of different modalities. The biggest weakness is the use of only DSC for quantitative analysis and comparison of the methods. It would be good to show strong performance on other metrics (eg: blood volumes) to show the usefulness of the method in a clinical setting. Another weakness is the use of an averaged out DSC for the final comparison. There are columns in the tables where the proposed method performs poorly compared to other methods. A discussion on this would be helpful to get more insights. The min/max for each of the methods and the columns in the tables would also be revealing. A stronger validation would be needed for deployment of such a method in clinical settings.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper adapts existing segmentation models and builds on them for a clinically important application of image segmentation. This is a well-written paper where the validation of the method could be improved and made stronger.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the reviewers for their valuable comments. R1 Q1: We calculated the clinically relevant HD95 metric. S-SAM shows superior performance with lower HD95 values, e.g., 44.5 for Cholec compared to AdaptiveSAM’s 77.2, and 49.4 for GLAS compared to AdaptiveSAM’s 106.2.

R1 Q2, R3 Q6: While S-SAM performs better on average, some methods outperform it for specific classes, suggesting it might be affected by label distribution disparities. It may benefit from methods like re-weighing the loss due to different classes. To motivate this, we presented classwise scores in the tables. We will discuss this potential future improvement in the revised version.

R3 Q1: Directly tuning SVs is akin to changing the original weights. In contrast, we freeze the original SVs and learn the affine params that can learn different transforms for different tasks without harming original model performance. Unlike SVF, this independence of the affine params from the original params opens the possibility of learning them for simpler tasks and then combining them to build a model that can work well across all the tasks, like [1] does with LoRA. Also, unlike SVF, S-SAM doesn’t require a support image and mask to guide the segmentation process. [1] Mixture of LoRA Experts

R3 Q2 Q3 Q4: S-SAM performs full-rank fine-tuning unlike LoRA (low-rank) and AdaptiveSAM (single-rank). At the same time, there are fewer tunable parameters in S-SAM. Hence, it is more robust to overfitting, leading to better results in datasets like GLAS, where there are less images and a single label of interest. We show that using TAL with S-SAM can improve performance over AdaptiveSAM. We ran LoRA with r = {2,4,8,16} and compared the performance on the validation set. r = 4 gave the best DSC with less tunable parameters. r = 8 gave minimal benefit over r = 4, but had much more tunable parameters. Hence, we chose r = 4 for our experiments.

R3 Q5: The SAM decoder identifies “objects” using features from the image and prompt encoders. Keeping the decoder frozen assumes that if the encoders can be transformed to get the expected embeddings for the decoder, for a given task, then the decoder need not be trained, saving memory and time. Comparisons with AdaptiveSAM, which trains the decoder, support this. However, for large datasets with many labels of interest, decoder-training might be required.

R4 Q1 Q2: Full finetuning involves 1 fg point if the ground truth isn’t empty and 1 bg point otherwise. These points are sampled from the ground truth masks and hence can be considered as manual point prompts. This is similar to existing SAM-adaptation papers. Using 1 point prompt also ensures a somewhat fair comparison with traditional methods and S-SAM which do not provide such explicit location information about the mask. We also compared our method with MedSAM, which uses box prompts as input. However, using box prompts might not lead to a fair comparison. In Fig 3, the green dot represents the used point prompt. We will update the caption to reflect that the last column shows SAM with a manual point prompt, not in auto-mode.

R4 Q3: Existing adapters require significant compute for training, leading to low-rank methods like LoRA. Our goal was to achieve full-rank adaptation, but with fewer parameters. Hence, we decided to explore tuning singular values (SVs). However, that would have removed the independence between the original weights and the adapted modifications. Hence, we developed tunable affine transforms for changing the SVs. In addition, existing literature [2] has shown the significance of norm layers in learning domain information. Hence, to further aid the adaptation process, we tuned the norm layers that can help learn the domain shift. With these two added changes, we show S-SAM’s effectiveness without an adapter-like structure. [2] On-the-Fly Test-time Adaptation for Medical Image Segmentation




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors propose a method S-SAM to adapt the SAM model to medical images. The reviewers are generally in favor of the paper. The authors shall carefully polish the paper to address the concerns raised by the reviewers in their final version.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors propose a method S-SAM to adapt the SAM model to medical images. The reviewers are generally in favor of the paper. The authors shall carefully polish the paper to address the concerns raised by the reviewers in their final version.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top