Abstract

Polyp segmentation plays a pivotal role in colorectal cancer diagnosis. Recently, the emergence of the Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation, leveraging its powerful pre-training capability on large-scale datasets. However, due to the domain gap between natural and endoscopy images, SAM encounters two limitations in achieving effective performance in polyp segmentation. Firstly, its Transformer-based structure prioritizes global and low-frequency information, potentially overlooking local details, and introducing bias into the learned features. Secondly, when applied to endoscopy images, its poor out-of-distribution (OOD) performance results in substandard predictions and biased confidence output. To tackle these challenges, we introduce a novel approach named Augmented SAM for Polyp Segmentation (ASPS), equipped with two modules: Cross-branch Feature Augmentation (CFA) and Uncertainty-guided Prediction Regularization (UPR). CFA integrates a trainable CNN encoder branch with a frozen ViT encoder, enabling the integration of domain-specific knowledge while enhancing local features and high-frequency details. Moreover, UPR ingeniously leverages SAM’s IoU score to mitigate uncertainty during the training procedure, thereby improving OOD performance and domain generalization. Extensive experimental results demonstrate the effectiveness and utility of the proposed method in improving SAM’s performance in polyp segmentation. Our code is available at https://github.com/HuiqianLi/ASPS.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/4128_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/HuiqianLi/ASPS

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Li_ASPS_MICCAI2024,
        author = { Li, Huiqian and Zhang, Dingwen and Yao, Jieru and Han, Longfei and Li, Zhongyu and Han, Junwei},
        title = { { ASPS: Augmented Segment Anything Model for Polyp Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a fine-tuning based domain adaptation approach called ASPS (Augmented Segment Anything Model for Polyp Segmentation). Through the designed Cross-branch Feature Augmentation Module (CFA), it utilizes a CNN to supplement the high-frequency components missing in ViT. Additionally, the Uncertainty-guided Prediction Regularization Module (UPR) leverages hints to reduce uncertainty during training, ultimately enhancing the SAM model’s adaptation to the domain of polyp segmentation and achieving excellent performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work proposes a simple method to adapt the SAM model for the domain of polyp segmentation, significantly enhancing its performance in polyp segmentation tasks.

    1. The considered problem is interesting.

    2. The performance of the two modules is convincing.

    3. The writing of the paper is concise and clear.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Not Novel Enough

    The method proposed lacks innovation as both major components, the Cross-branch Feature Augmentation Module (CFA) and the Uncertainty-guided Prediction Regularization Module (UPR), are not original ideas. The integration of CNN with ViT in the CFA module to enhance feature representation has been previously explored in works like “SAMUS: Adapting Segment Anything Model for Clinically-Friendly and Generalizable Ultrasound Image Segmentation.” Similarly, the use of uncertainty to guide training in the UPR module has been discussed in “Learning Confidence for Out-of-Distribution Detection in Neural Networks” by DeVries and Taylor, 2018.

    1. Generalization Concerns

    Although the ASPS model demonstrates strong performance on a single dataset after fine-tuning, the paper does not address its generalization capabilities to external datasets, which is crucial for clinical practice. Theoretically, models fine-tuned from robust foundation models like SAM should exhibit stable performance across diverse external datasets due to their inherent generalization capabilities. However, this aspect is not demonstrated or discussed in the paper.

    1. Limited Scope of Comparison

    As the study focuses on applying SAM in endoscopic polyp segmentation, I believe it should also compare with Foundation models in this field such as “Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train” to justify the superiority of SAM. Otherwise, why don’t we just finetune from a domain specific foundation model?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Generalization and Clinical Translation: Your paper shows promising results on several single dataset but lacks demonstration of generalization across multiple external datasets, which is crucial for clinical application.

    Suggestions:

    • Test the model on additional external datasets to demonstrate its robustness and generalization capabilities.
    • Discuss potential challenges and solutions for applying the model in diverse clinical settings.
    1. Scope of Comparison:

    The comparison primarily with the SAM model might overlook the potential benefits of using domain-specific foundation models.

    Suggestions:

    • Compare your model against specialized foundation models used in endoscopic image analysis to justify SAM’s use.
    • Highlight any advantages of using SAM over domain-specific models.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I rated this paper as “Weak Reject” due to several key concerns:

    • Not Enough of Novelty: The techniques used to adapt SAM for polyp segmentation lack significant innovation, closely resembling existing methods.

    • Generalization Not Tested: The paper doesn’t demonstrate how well the model generalizes across different datasets, which is crucial for clinical application.

    • Limited Comparative Analysis: The absence of comparisons with specialized models already used in polyp segmentation makes it difficult to assess the true value of using SAM in this context.

    Although ASPS provide a simple but effective method in adapting SAM into the domain of polyp segmentation, these issues collectively question the paper’s originality and practical applicability, prompting my recommendation for a weak rejection unless addressed in a rebuttal.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This work presents a novel approach to boost the foundation model SAM (Segment Anything Model) to perform better for polyp segmentation in endoscopy.

    The proposed SAM-based model, ASPS (Augmented SAM for Polyp Segmentation), incorporates two new modules:

    • Cross-branch feature Augmentation. It combines domain-specific features learned with an additional CNN encoder branch with the frozen ViT encoder from SAM.
    • Uncertainty guided Prediction Regularization. It uses SAM’s IoU score to reduce uncertainty during training and achieve a model more robust to out of distribution samples. The proposed strategy for domain adaptation of SAM does not require re-training the foundation model nor the use of prompts for training.

    The experimental validation shows how the proposed approach achieves more accurate results for the aimed polyp segmentation task than the original SAM in all benchmarks, and improves current state of the art approaches in polyp segmentation in several benchmarks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The motivation of the work is clearly and convincingly explained. Foundation models open new opportunities but there is a need for strategies to adapt or make the best use of them. 


    • The approach is well described, referring to the works that inspire the proposed modifications, and the graphical material is helpful to understand the proposed architecture.

    • The proposed strategy for domain adaptation of SAM does not require re-training the foundation model nor the use of prompts for training, which provides computational and practical advantages to perform the adaptation of such foundation model.

    • The experiments present promising results, where the proposed approach achieves more accurate segmentation than existing approaches in several public benchmarks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses I find are the following

    Regarding the related work discussion. I understand the space limitations, but I think the manuscript misses a bit more detailed discussion on domain adaptation for semantic segmentation, and on the main extensions proposed to adapt the model SAM. These would make the overview of the context of this work more complete and facilitate the evaluation of the significance of its contribution
:

    • The experiments mention many related approaches proposing SAM-modifications (results in Table 1), but the works are barely discussed and their proposed contributions are not compared or put in context with respect to the current work.

    • The introduction or a specific related work paragraph or section/subsection could also discuss a bit more what are the most relevant recent works for domain adaptation in semantic segmentation (not just to adapt SAM, but in general), including for example
 “Hoyer, L., et al. MIC: Masked image consistency for context-enhanced domain adaptation. CVPR 2023” or “Yang, J. et al. Context-aware domain adaptation in semantic segmentation. IEEE/CVF WACV 2021”

    The improvements presented (Table 1) are relatively small, and it is hard to identify if these differences are significant. It would be interesting, if possible, to show some more qualitative results than those in Fig. 4, including top performing approaches in the comparison, i.e. Polyp-PVT or SSFormer, and identify which examples or which parts of them seem to provide that advantage to the proposed approach. Supplementary materials could have been used for these additional images.

    I also miss a bit more explicit analysis or discussion on the computational advantages that the adaptation of SAM with the presented approach can have with respect to the current state of the art architectures trained “completely”. (I’m referring to trade-offs between the cost of training vs the cost at inference or related cost analysis)

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The work presents sufficient details regarding experimental settings, there is code available and experiments are run on public well known benchmarks

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    A few minor issues that could be fixed or clarified

    • Other related works present the average results across all polyp segmentation benchmarks, it would be interesting to add the average to all the benchmarks used in Table 1.

    • This claim from the paper is a bit confusing: “It’s important to note that methods like MedSAM[14] and SAMUS[13] still incorporated prompts like SAM, but we removed them in the comparative experiment.“ Not sure what it’s referring to, because in the main results Table 1, they are both included.

    • I miss a reference to Table I in the “Results and Analysis” text in the manuscript

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The work presents a novel strategy to boost the foundation model SAM for a relevant medical segmentation task, polyp segmentation. The results look promising and the model presents interesting ideas, and overall, the work is well presented. There are some issues with the thoroughness of the related work discussion and the results analysis. If both issues could be addressed, the contribution of the presented work would be more sound and convincing.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I appreciate authors’ effort to address all the raised concerns. After reading the rebuttal and the other reviews, I find many of my concerns addressed with feasible clarification actions (text and more examples) that the authors can incorporate on a final version. The only unclear part in the rebuttal is the discussion of the advantages in terms of computational complexity, since the comments seem to indicate that the presented adaptation of SAM does not bring benefits with respect to training a model from scratch in this respect.



Review #3

  • Please describe the contribution of the paper

    The paper introduces an innovative method, ASPS, which significantly enhances the performance of the SAM for the specific task of polyp segmentation in endoscopy images. The key contributions include the development of the Cross-branch Feature Augmentation (CFA) module to integrate multi-scale and multi-level features, and the Uncertainty-guided Prediction Regularization (UPR) module to improve out-of-distribution performance and domain generalization. These enhancements lead to superior segmentation results, outperforming several state-of-the-art methods and demonstrating the potential for more accurate colorectal cancer diagnosis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel Backbone Usage: The article combines the powerful pre-training results of SAM, improves the SAM structure for polyp image segmentation, and introduces local information through CNN to finally achieve high accuracy.
    • Strong Empirical Results: The extensive experiments show clear improvements over baseline models and other methods, validating the effectiveness of the proposed approach. In addition, the dimensions of ablation experiments are relatively sufficient.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • SAM is an interactive model with generalizability. This article adopts a fully automatic method and limits the model’s capabilities to polyp segmentation, with limited application scenarios.
    • Ablation studies on UPR show that the results without UPR are worse than many sota methods. The effectiveness of the network design of SAM plus CNN and CFA is questionable.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Tab 1 is not cross-referenced in the body of the text.
    • Fig. 3 shows the results of Fourier analysis. You can add a reference to illustrate that this approach is reasonable and help readers understand it easier. In addition, the specific network layers used to obtain the features need to be explained.
    • Tab 4 shows the Ablation studies on UPR, showing poor performance without UPR, please check that ablation experiments are adequately trained.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Clear expression, complete experimental results, and certain novelty

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We appreciate the reviewers’ valuable feedback and are encouraged by their recognition of our method’s clear motivation and promising results. To address their concerns, we offer the following clarifications:

A1: Novelty against(R#3) Our method tackles two critical issues in polyp segmentation: (a) polyp’s lesion-background differentiation and edge blurring from high-frequency detail absence; (b) SAM’s limited generalization due to the lack of training on endoscopic images. We introduce two innovations:

  • We integrate CNN with ViT to bolster high-frequency features. Our method differs from SAMUS[13] in motivation and implementation. SAMUS[13] targets small images; we address high-frequency deficiency. We utilize simple decoder cross-attention for fusion, contrasting SAMUS’s multi-layer CNN-ViT interaction.
  • We guide model training with uncertainty and generalization. [4] proposed a training method based on uncertainty in OOD classification. Inspired by this, we extend it to segmentation and further innovatively propose pixel-level and image-level uncertainties to enhance SAM’s generalization.

A2: Extend on other datasets(R#3) Our ASPS trained on Kvasir-SEG and ClinicDB datasets and tested on five datasets, demonstrating its generalization to external datasets. Meanwhile, we are confident in its applicability to external tasks, such as skin lesion segmentation, which faces similar analogous issues of edge blurring and high-frequency information scarcity.

A3: Comparison with specific Foundation models(R#3) We appreciate Endo-FM and will introduce it following R#1 in A4. We think fine-tuning Endo-FM differs from our method. Endo-FM is a video-level pre-trained model with rich endoscopic knowledge, while SAM excels in image-level segmentation tasks. They differ in input, generalization and task-specific ability. Despite this, ASPS outperforms Endo-FM in polyp segmentation with less data, demonstrating SAM’s generalization. We will integrate these works in future research.

A4: Absence of related work(R#1) We will add the related work in Introduction section:

  • Studies like Polyp-PVT[5], SSFormer[21] used Pyramid Vision Transformer for polyp segmentation; CFANet[32] integrated boundaries with a Cross-level Feature Aggregation Network; Endo-FM captured spatial-temporal dependencies to build a foundation model.
  • Diverse methods addressed unsupervised domain adaptation in semantic segmentation. MIC proposed a Masked Image Consistency module for target domain context learning; Context-Aware Domain Adaptation improved context transfer via cross-attention. Yet, domain-specific information integration and uncertainty reduction are still unexplored.

A5: Limitations without prompts(R#4) We agree that removing prompts will limit the segmentation capabilities of SAM, but the interaction in polyp segmentation requires certain costs, such as experienced doctors. While automated segmentation in clinical practice can significantly aid both doctors and patients.

A6: Experimental results without UPR(R#4) All models have been adequately trained, and CFA’s effectiveness can be compared to the initial two rows of Table 2. The first row of Table 4 is identical to the second row of Table 2, demonstrating that CFA enhanced performance.

A7: Computational complexity(R#1) Most ASPS parameters are in CNN; Mask Decoder has a smaller parameter count. For Polyp-PVT[5], it trained in 3 hours on TeslaP100 at batch size 16, reaching ~60 FPS. Our Efficient-SAM model, at batch size 4 on RTX3090, also takes 3 hours, achieving ~20 FPS.

A8: Others We will supplement more qualitative results and the citation of Table 1 in Results and Analysis(R#1&4). “MedSAM[14] and SAMUS[13] still incorporated prompts like SAM, but we removed them” refers to the prompts in SAM’s encoder(R#1). Fig. 3 demonstrates CNN’s richer high-frequency information, aligning with ViT-Adapter’s study, and indicates features CNN_2 and ViT_2, as denoted by the left of arrows(R#4).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper’s approach is novel, and the results were well evaluated in comparison with many previous methods. I recommend accepting this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper’s approach is novel, and the results were well evaluated in comparison with many previous methods. I recommend accepting this paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviewers found the work to address an interesting problem, with good motivation and a clear, easy-to-follow approach. The experiments present promising results. The rebuttal addresses most of the concerns, except for the discussion of the advantages in terms of computational complexity. I found the work addresses an interesting problem by effectively adapting the foundation model SAM in polyp segmentation. I am inclined to accept this paper and give the MICCAI community the opportunity to discuss it further.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviewers found the work to address an interesting problem, with good motivation and a clear, easy-to-follow approach. The experiments present promising results. The rebuttal addresses most of the concerns, except for the discussion of the advantages in terms of computational complexity. I found the work addresses an interesting problem by effectively adapting the foundation model SAM in polyp segmentation. I am inclined to accept this paper and give the MICCAI community the opportunity to discuss it further.



back to top