Abstract

Deep learning based methods often suffer from performance degradation caused by domain shift. In recent years, many sophisticated network structures have been designed to tackle this problem. However, the advent of large model trained on massive data, with its exceptional segmentation capability, introduces a new perspective for solving medical segmentation problems. In this paper, we propose a novel Domain-Adaptive Prompt framework for fine-tuning the Segment Anything Model (termed as DAPSAM) to address single-source domain generalization (SDG) in segmenting medical images. DAPSAM not only utilizes a more generalization-friendly adapter to fine-tune the large model, but also introduces a self-learning prototype-based prompt generator to enhance model’s generalization ability. Specifically, we first merge the important low-level features into intermediate features before feeding to each adapter, followed by an attention filter to remove redundant information. This yields more robust image embeddings. Then, we propose using a learnable memory bank to construct domain-adaptive prototypes for prompt generation, helping to achieve generalizable medical image segmentation. Extensive experimental results demonstrate that our DAPSAM achieves state-of-the-art performance on two SDG medical image segmentation tasks with different modalities. The code is available at https://github.com/wkklavis/DAPSAM.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0929_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0929_supp.pdf

Link to the Code Repository

https://github.com/wkklavis/DAPSAM

Link to the Dataset(s)

https://liuquande.github.io/SAML/ https://zenodo.org/records/6325549

BibTex

@InProceedings{Wei_Prompting_MICCAI2024,
        author = { Wei, Zhikai and Dong, Wenhui and Zhou, Peilin and Gu, Yuliang and Zhao, Zhou and Xu, Yongchao},
        title = { { Prompting Segment Anything Model with Domain-Adaptive Prototype for Generalizable Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors proposed domain-adaptive prompt SAM (DAPSAM) to address single-source domain generalization in segmenting medical images. DAPSAM uses 1) an MLP-based adapter in transformer encoder which fuse low level features with intermediate features for improved segmentation; and uses 2) learnable memory bank to construct domain-adaptive prototypes for automatic prompt generation. DAPSAM is better than several U-Net-based models and SAM-based models on two single domain generalization segmentation tasks: a prostate segmentation task and an optic cup and optic disc segmentation task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors have performed extensive experiments to compare with state-of-the-art domain adaptation techniques and ablation studies.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The authors claimed that low level features are essential for generalized segmentation and cited paper “Explicit Visual Prompting for Low-Level Structure Segmentations”. However, the paper did not talk about the importance of low-level features. The authors should justify why they fuse low-level features.
    2. Using memory bank can help automatic prompt generation, but how it helps domain adaptation is questionable.
    3. It is not clear how the memory bank is updated. The authors only mentioned the prompt generating process given an input image in section 2.2.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?
    1. There are no details about the MLP layers.
    2. The experimental setting is not clear. 1) How are the prompt generated for training? 2) What’s the training, validation, and testing data used for each column in Table 1? 3) What type of GPU and batch size are used?
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Please provide the detailed function used to expand feature p_i to the spatial size hxw as e_i.
    2. In Table 2, the authors claim that they run the DAPSAM three times. Are the results in Table 1 and from the state-of-the-art methods obtained the same way?
    3. What does it mean by “The rank of the adapter is set to 4 for both efficiency and performance optimization.”
    4. Please show examples of automatically generated prompts.
    5. The experimental setting is not clear. 1) How are the prompt generated for training? 2) What’s the training, validation, and testing data used for each column in Table 1? 3) What type of GPU and batch size are used?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The authors should justify why they fuse low-level features.
    2. Using memory bank can help automatic prompt generation, but how it helps domain adaptation is questionable.
    3. Lack of details about memory bank update during training.
  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have justified their memory bank for domain adaptation purposes. However, the low-level feature fusion used a channel-wise attention map which cannot provide selection for x and y dimensions.



Review #2

  • Please describe the contribution of the paper

    This paper addresses the single-domain generalization (SDG) problem, employing a prototype-based prompt generation module. This paper claims it has the capability to automatically generate prompts tailored to the current image segmentation task, albeit with weak domain specificity.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Alterations based on the excellent pre-training model SAM to reduce training costs and increase generalizability. (2) This paper devises domain adaptive prompt generator using prototype-based memory bank learned from source domain images. (3) This paper uses the adapters in each transformer block by integrating low-level features into intermediate features, followed by a channel attention filter to improve the robustness of image embeddings.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The prostate screenshots in the supplemental material appear noisy. It could be beneficial to investigate whether there was an issue with converting them from the nii format correctly, as this may be a contributing factor to the noise. (2) The introduction of the memory bank seems somewhat unclear. It serves as the core of domain migration, but it would be beneficial to explain why it enables model domain migration, how the memory bank is initialized, and whether it iterates throughout the training process. Clarifying these aspects would enhance understanding. (3) There seems to be a lack of benchmarks and clear references to the baseline. Including fewer benchmarks could improve clarity. (4) The specific parameters of the model are not indicated. It would be helpful to provide information on the number of parameters for both the current method and previous methods to prevent attributing better results solely to parameter expansion.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Additional benchmarks could be included. Additionally, the core aspect of the thesis should be explored in greater depth, clarifying the module’s design rationale, functionalities, and its efficacy in facilitating domain migration.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The primary reasons for the rating are twofold: the significance of the problem addressed in this paper and the lack of clarity regarding the core functionality of the model, particularly concerning its capability for domain migration. Additionally, concerns are raised regarding the accuracy of the prostate screenshots in the supplemental material.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper introduces a novel framework called Domain-Adaptive Prompting Segment Anything Model (DAPSAM) for generalizable medical image segmentation. The key contributions include a domain-adaptive prompt generator using prototype-based memory banks, a generalized adapter structure for improved feature robustness, and extensive experimental validation demonstrating state-of-the-art performance on single-source domain generalization tasks in medical image segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The introduction of DAPSAM, which integrates a domain-adaptive prompt generator and a generalized adapter, represents a novel approach to address single-source domain generalization in medical image segmentation. 2) The paper provides thorough experimental validation on two widely used benchmarks, demonstrating significant performance improvements over existing methods. 3) The paper is well-structured and clearly explains the proposed methodology, making it accessible to readers.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) There are instances of repetition and lack of clarity, particularly in the abstract and conclusion sections. Highlighting the main outcomes and results in the abstract and the discussion sections and streamlining the presentation could improve readability and comprehension. 2) The paper doesn’t provide a related work section. The introduction can be more specific for any background or introduction for the problem while the missed related work section could be expanded to provide a more comprehensive overview of existing methods and highlight the novelty of the proposed approach in comparison. 3) The paper lacks more details about the dataset preprocessing and examples. 4) Limited Discussion of Limitations: The paper focuses primarily on single-source domain generalization in medical image segmentation. However, there is limited discussion on the generalizability of the proposed approach to other domains or modalities. It would be valuable to explore the transferability of the DAPSAM to different medical imaging tasks or even non-medical domains to check its generalizability and robustness across various domains. 5) Potential Overfitting Concerns: The paper introduces a memory bank module to store learned information and generate domain-adaptive prototypes. However, there is a risk of overfitting to the source domain, especially if the memory bank size is too large. Providing further analysis or regularization techniques to avoid overfitting and ensure the generalizability of the model would strengthen the paper.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    : The paper provides sufficient details on the methodology and experimental setup, which should facilitate reproducibility. However, additional clarification on certain aspects, such as the implementation of specific components, could further enhance reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) I recommend improving the related work section by providing a more comprehensive comparison with existing methods. Additionally, clarity in explaining mathematical equations and providing visual aids to illustrate key concepts could enhance reader understanding. 2) Streamline the presentation in the abstract and conclusion sections to avoid repetition and enhance clarity. 3) Provide more details about the dataset preprocessing and examples 4) Consider discussing potential limitations and future directions in more detail to guide further research in the field. 5) Providing code and dataset is a plus 6) Provide overfitting and generalization analysis

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel approach to a significant problem in medical image segmentation and provides thorough experimental validation. While there are areas for improvement, such as clarity and a more comprehensive discussion of related work, the overall contribution and presentation.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I kept my original rating. The rebuttal didn’t add much regaridng my concerns about overfitting. It just mentioned the size of the bank affects the overfitting. This is far from being a detailed analysis.




Author Feedback

We thank the reviewers for the appreciation of novelty and effectiveness: significant problem (R1); novel, well-written, significant performance improvements, thorough validation (R3); extensive experiments (R4). Detailed responses are given below. We will release the source codes.

1, Why memory bank helps domain migration [R1, R4] We use the knowledge stored in the memory bank learned from the source domain to generate prompts for the target domain features. We project all target domain knowledge into the latent space and use source domain knowledge of the memory bank to represent them. As shown in the t-SNE visualization of prototypes in the supplementary materials. This helps to align the source and target domain, thereby enhancing performance on the target domain. On the other hand, similar to few-shot learning, features in the memory bank serve as guiding support features when encountering target query features, helping to generalize to target domain.

2, Why use low-level features? [R4] Low-level features contain information such as contours, which are crucial for medical segmentation, and have shown useful in the skip-connection of UNet. Additionally, thanks to the inherent generic segmentation capability of SAM, the low-level features of SAM without task/domain-specifically fine-tuning also contain general information for segmentation. Therefore, we make use of such general low-level features by introducing a selective attention mechanism to filter out features that are not conductive to generalization, retaining robust features across domains. We will make it clear in the revised paper.

3, Details of the memory bank [R1, R4] As described in Abstract, the memory bank is learnable. We randomly initialized the memory bank. During the training, the memory bank in the prompt generation module is automatically updated through backpropagation. During inference, the memory bank is frozen. We will clarify this in the revised paper.

4, Overfitting concerns [R3] As depicted in Tab. 4 of paper, using a large memory bank size indeed leads to overfitting. Using a relatively small memory bank size is beneficial.

5, Clarification on benchmarks and baseline [R1] As described at the end of Sec. 3.1 and at the beginning of Sec.2, our baseline is based on SAM’s original encoder and decoder, using two adapters per layer following [30] and changing the original prediction of SAM to semantic segmentation output.

6, Parameters of models [R1] The total number of parameters and Dice performance in Tab. 1 for CCSDG [14], SAMed [33], Baseline, and our DAPSAM are 43.80M (67.58%), 90.85M (78.51%), 98.26M (78.87%), 98.94M (81.31%), respectively. We achieve 2.44% improvement with slight increase of parameters with our baseline.

7, Preprocessing, visualization of data and examples of prompts [R3, R1, R4]
We adopt the same pre-processing as CCSDG [14]. Indeed, the noisy visualization in supplementary material suffers from an issue with image converting, which will be corrected. We have provided the t-SNE visualization of prototypes for prompts in the supplementary.

8, Details of operations [R4] We use “expand_as” function, which repeats tensor p_i to e_i. The adapter’s MLP has two linear layers, which reduces the channel dimension and restores the dimension, respectively. The rank is defined by the ratio between the input and reduced dimension, and is set to 4 to balance the number of parameters and tuning capability.

9, Experimental settings [R4] The results are either directly sourced from existing papers or reproduced with the same way as DAPSAM. Each column in Tab. 1 represents leave-one-out results for the model trained on the corresponding domain (in first row) while testing on the other domains. We use a single NVIDIA RTX 3090 GPU with batch size set to 8.

10, Paper structure [R3] Thanks for the suggestion. We will carefully revise abstract and conclusion, and include a comparative discussion of related work, limitations.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    One reviewer increased the rating (WR to WA) after rebuttal, and the other two reviewers’ ratings remain the same as pre-rebuttal. The AC considered the paper, the rebuttal, and the post-rebuttal comments and find that the authors’ rebuttal addressed some questions by the reviewers scuh as memory bank, parameter, and experimental settings, but some responses related to the overfitting, low-level feature, open-source, etc. remain not clear. The AC followed the reviewers’ recommentations and recommended acceptance of the paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    One reviewer increased the rating (WR to WA) after rebuttal, and the other two reviewers’ ratings remain the same as pre-rebuttal. The AC considered the paper, the rebuttal, and the post-rebuttal comments and find that the authors’ rebuttal addressed some questions by the reviewers scuh as memory bank, parameter, and experimental settings, but some responses related to the overfitting, low-level feature, open-source, etc. remain not clear. The AC followed the reviewers’ recommentations and recommended acceptance of the paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper received three positive comments from the reviewers. The methodology was clearly described, and the results are promising. I recommend accepting this paper for publication.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper received three positive comments from the reviewers. The methodology was clearly described, and the results are promising. I recommend accepting this paper for publication.



back to top