List of Papers Browse by Subject Areas Author List
Abstract
Leveraging large vision models (LVMs), such as the Segment Anything Model (SAM), in medical image analysis presents significant potential to enhance diagnostic efficiency. Existing SAM-based medical segmentation methods inadequately address two critical challenges: rapidly adapting LVMs to medical tasks through few-shot fine-tuning, and the inherent difficulty in distinguishing lesions from anatomically similar background regions in medical images. To overcome these limitations, we propose CD-PolypNet, a novel framework integrating a Semantic Supervision via Feature Distillation (SSFD) and an Edge-Guided Feature Branch (EFB).
The SSFD module leverages feature distillation to transfer knowledge from SAM’s strongly supervised features into early-stage feature learning, enabling efficient domain adaptation of large vision models under data scarcity. Concurrently, EFB enhances boundary discrimination in lightweight decoder through a hybrid strategy combining the Canny operator and Edge-Frequency Gated Convolution (EFGConv), thereby prioritizing edge-aware feature extraction. Extensive experiments across five challenging medical imaging datasets demonstrate that our method not only surpasses state-of-the-art approaches in accuracy and robustness but also establishes a new paradigm for cross-domain adaptation of large vision models in specialized medical applications. The codes are available at https://github.com/ChangpengYue/CD-PolypNet.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2291_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/ChangpengYue/CD-PolypNet
Link to the Dataset(s)
Kvasir-SEG: https://datasets.simula.no/kvasir-seg/
CVC-ClinicDB: https://polyp.grand-challenge.org/CVCClinicDB/
CVC-ColonDB: http://vi.cvc.uab.es/colon-qa/cvccolondb/
ETIS: https://polyp.grand-challenge.org/EtisLarib/
EndoScene: http://adas.cvc.uab.es/endoscene
BibTex
@InProceedings{YueCha_CDPolypNet_MICCAI2025,
author = { Yue, Changpeng and Zhao, Jianxiang and Wang, Chao and Zhao, Xinglun and Mao, Axiu and Hou, Jia and Yan, Chenggang and Zhao, Kai and Wang, Shuai},
title = { { CD-PolypNet: Cross-Domain Polyp Segmentation Network with Internal Feature Distillation and Dual-Stream Boundary Focus via Large Vision Model } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15969},
month = {September},
page = {97 -- 107}
}
Reviews
Review #1
- Please describe the contribution of the paper
This manuscript proposed CD-PolypNet, a cross-domain polyp segmentation framework that adapts the SAM to medical image analysis through two key components: (1) a Semantic Supervision via Feature Distillation (SSFD) module that enables cross-layer semantic guidance to enhance early-stage feature representations under limited annotation settings, and (2) an Edge-Guided Feature Branch (EFB) that integrates Canny edge maps with frequency-aware convolution and attentional fusion to improve boundary discrimination.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper presents a well-structured framework that attempts to address two key limitations in adapting SAM to the domain of medical image segmentation. The proposed CD-PolypNet incorporates two modules—SSFD and EFB—that are the core strengths of this work.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
(1) The workflow in Figure 1 is not clearly presented. The interaction order between modules and the direction of feature flow are difficult to understand, which limits the interpretation of the module’s effectiveness and practical usability.
(2) The Cross-layer Feature Alignment proposed in the Feature Distillation for Semantic Supervision module is conceptually similar to standard cross-layer distillation practices. What are the fundamental improvements of the proposed method compared to these existing approaches?
(3) In Equation (2), the authors sample a subset of feature channels from the encoder and decoder to compute the distillation loss. However, this sampling is performed randomly without replacement. Could such randomness lead to training instability or sensitivity? The authors should further discuss its potential impact.
(4) In the Edge-Guided Feature Branch, using Canny edge maps for edge enhancement seems to be a relatively common approach. What is the core innovation in this part, especially in terms of theoretical contribution? The current design appears more like a recombination of existing modules.
(5) In Equation (7), encoder features are convolved with a Gaussian kernel to obtain a low-frequency signal, from which a high-frequency residual is derived. What evidence supports that this residual carries meaningful boundary details?
(6) The authors propose some new concepts, but the ablation studies are not sufficient. For example, although the paper claims enhanced boundary sensitivity, there is a lack of specific evaluation metrics to validate this claim.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
While the paper demonstrates certain levels of innovation and presents a framework that achieves notable improvements in medical image segmentation, there remain several aspects that require further clarification.
- Reviewer confidence
Somewhat confident (2)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
authors have addressed most of my corcerns
Review #2
- Please describe the contribution of the paper
This paper proposes a novel network architecture, CD-PolypNet, for polyp segmentation. CD-PolypNet incorporates a Semantic Supervision via Feature Distillation module designed to facilitate knowledge transfer from LVM, and an Edge-Guided Feature Branch to enhance the model’s performance in capturing details, particularly along polyp boundaries.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The motivation for addressing the challenges of edge processing in polyp segmentation and for leveraging LVM is clearly articulated. The corresponding methodologies appear logical and well-conceived.
- The proposed approach has been extensively evaluated on multiple datasets, with comparisons against a variety of existing methods, demonstrating its effectiveness.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The manuscript does not provide sufficient detail on how the proposed modules and the specific loss function are integrated and utilized during training. Additional clarification on their roles within the training pipeline is needed.
- The use of high input resolution and a potentially large network architecture raises concerns about the model’s inference speed and, consequently, its suitability for practical clinical applications.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Minor issues: Several formulas are missing commas. References [3], [9], and [14] are not cited in the paper.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Clear motivation, comprehensive experimental results
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
The paper presents CD-PolypNet, a novel framework that rapidly adapts the SAM model to the medical domain by integrating Semantic Segmentation and Feature Distillation (FFSD) and Edge Guided Feature Branch (EFB) modules. The paper demonstrates strong experimental results that surpass state-of-the-art methods for polyp segmentation.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Introduction Section: the authors skillfully outline the challenges posed by scarce medical image annotations and limited domain-specific knowledge, emphasizing how models struggle with limited data. Their contribution to enabling efficient domain adaptation under these constraints is clearly presented.
Proposed Method and Visualizations: The authors effectively detailed the challenges early-stage features face due to limited supervision data. Their reasoning behind the proposed modules and loss functions is clearly outlined, with CAM visualizations demonstrating their thought process. These visualizations underscore the key insights into the decision-making strategy leading to their approach.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Comparison Data: I noticed that the performance comparison in Table 1 includes methods like Polyp-PVT [31], which were fine-tuned on the dataset that were used in the paper, and others like SurgicalSAM [26], which were trained on different datasets (e.g., EndoVis2017). Could you please clarify if the methods from the papers you compared were applied to the data used in your study, or if you used the models exactly as they were presented in those papers? Providing this information would help ensure that the comparison is both fair and thorough.
Minor: The SAM encoder is trained using cross-layer distillation and intra-layer losses focused on its initial features. Nonetheless, Figure 1 shows that the backbone is labeled as frozen. Could you please provide further clarification ?
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Typo in Figure 1: “Sober” should be corrected to “Sobel”
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(6) Strong Accept — must be accepted due to excellence
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Based on the clear presentation of the challenges and solutions in both the introduction and their proposed method, as well as the insightful use of CAM visualizations to illustrate their design choices, I recommend accepting the paper.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The paper clearly outlines the challenges in adapting SAM features, providing solid background and motivation for the work. The introduction of the SSFD and EFB modules is convincing and effectively addresses these challenges in transferring SAM features to polyp segmentation. Furthermore, the extensive experiments conducted demonstrate the paper’s novelty and robustness, underscoring its contributions.
Author Feedback
We thank the reviewers for their thoughtful feedback and appreciation for our strengths, including innovation (R1R3), good clarity and organization (R2R3), clear motivation (R2), comprehensive experimental results (R2R3), and logical, well-conceived methodologies (R2). Below, we address the reviewers’ comments in detail. R1 -The workflow in Fig. 1 is not clearly presented. We will add more annotations to improve the readability of Fig. 1, which mainly contains three branches. In the SSFD module, one branch learns features through an extended SAM under feature distillation supervision. In the EFB module, two branches are designed to learn edge-aware features: one branch adopts our proposed EFG-Conv to filter the non-edge information from the early features of SAM and the other branch adopts Canny operator to extract edge-features from the original images. Finally, all learned features from three branches are fused for segmentation. -What are the fundamental improvements of the cross-layer feature alignment in SSFD compared with the standard cross-layer distillation? The key difference is a random sampling strategy is designed to align channel dimensions between teacher and student features to improve model robustness.
-Could random sampling lead to training instability or sensitivity? Random sampling can effectively alleviate the overfitting problem and achieve a certain degree of regularization. And our experimental results also show that the model performance is stable and insensitive after sufficient learning. -What is the core innovation in EFB? The core innovation is propose an EFG-Conv to filter the non-edge information from the early features of SAM and fuse explicit edge priors based on Canny operator to enhance edge-aware feature learning. -What evidence supports the residual carries meaningful boundary details in Eq. (7)? Based on Marr-Hildreth theory, image edges generally correspond to high-frequency features. In our framework, a Gaussian smoothing strategy is used to extract low-frequency features and high-frequency features are obtained by subtracting low-frequency features from the overall learned features. -The proposed boundary sensitivity strategy lacks ablation validation. Ablation experiments using HD95 were conducted to validate the effectiveness of the proposed boundary sensitivity strategy on Kvasir. The fine-tuned SAM yields an HD95 of 17.00mm, with SSFD alone achieving 8.12mm and with EFB alone achieving 6.89mm, and with both of them achieving 5.68mm. R2 -How the proposed modules and the specific loss function are integrated and utilized during training. There are totally four loss functions in our framework. Two losses of SAM itself supervise overall model training and two losses proposed in SSFD supervise the cross-layer feature alignment between f_i (i=1,2,3) and f_t and intra-layer feature decorrelation in f_i (i=1,2,3). -The high input resolution and large network architecture raises concerns about the model’s inference speed. Inference speed is one factor in practical scenarios, while the advantages of large models in accuracy and generalization can make them more efficient in deployment. And in future work, we will design some lightweight techniques to improve the inference speed. R3 -More details about comparison data used by compared methods to clarify fair comparison. For all compared methods, the data splitting and hyper-parameter settings are kept consistent. All compared traditional supervised methods are trained and all SAM-based methods are fine-tuned using the same pre-trained weights on the same dataset. -The SAM encoder in SSFD is trained under the cross-layer distillation loss and intra-layer loss, but in Fig. 1, the backbone is labeled as frozen. The top line in Fig. 1 is somewhat ambiguous. The image encoder is frozen in the training but the mask decoder is updated based on the cross-layer distillation loss and intra-layer loss. We will correct the figure to make it more clearly.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
All three reviewers are positive to accept this work after the rebuttal. Following these ratings, I think this work can be published in MICCAI 2025.