Abstract

Despite numerous techniques developed for polyp segmentation, the issue of generalizability to new centers and populations persists. To address these issues, we compile a multicenter train set consisting of 4,000 polyp frames and propose a novel approach toward generalizing to different data centers, difficult polyp morphologies (e.g., flat or small), and inflammatory conditions such as inflammatory bowel disease (IBD). In this regard, we propose a transformer-based polyp segmentation model to leverage global contextual information, and enhancement of local feature interactions through a novel feature decoding and fusion method, and polyp edge features. This combines the vision transformers’ strong contextual understanding with enhanced locality modeling through graph-based relational understanding and multiscale feature aggregation. We compare our model with eight recent state-of-the-art methods under five widely used metrics on the following benchmark datasets: Kvasir-Sessile, SUN-SEG-Easy (Seen), ETISLaribPolypDB, CVC-ColonDB, PolypGen-C6, and our in-house IBD dataset. Extensive experiments show that our model outperforms state-of-theart methods on out-of-distribution datasets with mIoU improvements of 2.84% on ETIS-LaribPolypDB, 1.26% on CVC-ColonDB, 1.90% on PolypGen-C6, and 3.52% on the in-house IBD polyp dataset compared to the most accurate recent method. The code is available at https://github.com/Raneem-MT/ESPNet.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4428_paper.pdf

SharedIt Link: https://rdcu.be/eHw51

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05141-7_16

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Raneem-MT/ESPNet

Link to the Dataset(s)

We will provide the link to the training dataset and public datasets used for test in the below GitHub link: https://github.com/Raneem-MT/ESPNet

BibTex

@InProceedings{TomRan_ESPNet_MICCAI2025,
        author = { Toman, Raneem AND Subramanian, Venkataraman AND Ali, Sharib},
        title = { { ESPNet: Edge-Aware Feature Shrinkage Pyramid for Polyp Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15970},
        month = {September},
        page = {157 -- 167}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes integrating edge features with the Feature Shrinkage Pyramid Network (FSPNet) to improve the performance of polyp segmentation. Ablation studies demonstrate that the proposed modification enhances the original FSPNet’s segmentation accuracy.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) The paper is the first to integrate edge features with the FSPNet architecture for medical image segmentation, which is a novel idea. (2) The method is evaluated on multiple datasets, and the results show particularly significant improvements in unseen scenarios, demonstrating good generalizability.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

(1) The distinction between seen and unseen testing scenarios is not clearly explained. (2) It is unclear which dataset is used to train the model for evaluation on unseen datasets.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

(1) In the implementation details, please clarify whether the learning rate decay schedule (reduction by a factor of 10 at epoch 50) is also applied to the baseline and comparison methods. (2) The definition of seen vs. unseen testing should be explained more clearly. Multiple unseen datasets are used—are these tested using a model trained solely on seen datasets? If so, please specify which dataset(s) were used for training.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

(1) The proposed method introduces a simple yet effective enhancement to an existing network, improving segmentation performance. (2) The results are convincing across a range of datasets, although the experimental setup and training/testing details could be clarified further.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The paper leverages a Feature Shrinkage Pyramid (FSPNet) [7] to create ESPNet for the polyp segmentation task. ESPNet consists of three components: a ViT-based encoder, a token enhancement module (TEM), and feature shrinkage decoders (FSDs) for mask and edge prediction.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper builds ESPNet based on FSPNet [7], adding residual connections and an additional FSD module for edge prediction, which is connected to the mask FSD via CBAM. Multi-scale loss is applied to the FSDs.
To test out-of-distribution (OOD) performance, the paper creates a combined training set of 4000 images from public datasets and evaluates on separate test sets and other datasets, including their in-house IBD dataset. The results show that the proposed ESPNet improves performance and generalizability to out-of-distribution datasets.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

1/ The paper does not explicitly explain how to obtain M_s and E_s for each scale (16, 8, 4, 1), as Figure 1 only illustrates the computation of M_1 and E_1 at the final scale.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

1/ Mentioning “FSD” as a contribution without explicitly explaining its abbreviation is not good practice.
2/ Adding the shapes of tokens T^0, T^f, and T in Section 2.2 would be helpful for readers. The same applies to the deserialization D and T_{O_1} in Section 2.3.
3/ Improving Figure 1 would enhance clarity: fix the low-resolution lines in FSD, highlight the upsampling in the residual connection, re-align the lines, increase the brightness for the edge path, and add notations for the mask and edge branches.
4/ Since the 4000-image WLE dataset is created based on published datasets, it would be helpful to clarify whether the WLE dataset and IBD test set will be released for reproducing experiments. 5/ Adding references for the calculation of metrics ( S_\alpha, F_\beta, and MAE) would improve the clarity and completeness.
6/ Correct “Fβw” to “F_{βw}” in Tables 1 and 2.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well-organized and easy to follow. However, there are still some issues, as mentioned above.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This paper proposes ESPNet, a transformer-based framework for polyp segmentation designed to improve generalizability across centers and challenging cases (e.g., IBD, flat polyps). The model enhances a ViT backbone with a token enhancement module and dual decoders for mask and edge features. A CBAM attention module and residual connections further improve multiscale feature aggregation. Experiments on six datasets, including an in-house IBD set, show strong out-of-distribution performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Strong generalization: ESPNet shows consistent improvement over state-of-the-art methods across multiple unseen datasets, including a clinically challenging IBD dataset.
2. Well-designed enhancements: Incorporates edge features via a second decoder and attention-based fusion, improving boundary awareness and detail preservation.
3. Solid empirical support: Comprehensive quantitative results, qualitative comparisons, and clear ablation studies demonstrate the impact of each module.
4. Clinical relevance: Focus on real-world variation across centers and patient cohorts (e.g., IBD) increases the practical utility of the model.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

1.Limited discussion on inference cost: Model complexity (dual decoders, ViT backbone) likely increases computational demand, but runtime or efficiency metrics are not reported.

2.Graph-based token fusion (GFM) lacks ablation: The effect of the graph module is not independently validated, leaving its contribution unclear.

3.In-house IBD dataset not fully described: Details about annotation quality, data diversity, or public release plan are missing, which impacts reproducibility and evaluation fairness.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

see strengths
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

N/A

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

ESPNet: Edge-Aware Feature Shrinkage Pyramid for Polyp Segmentation

Author(s):