Abstract

Accurate polyp segmentation during colonoscopy is crucial for the early detection and timely intervention of colorectal cancer. Recently, Mamba, a State Space Model, has gained significant attention in polyp segmentation due to its remarkable ability to model long-range dependencies with linear computational complexity. However, Mamba-based methods face two key challenges: (1) their fixed scanning pattern limits the capture of dynamic spatial context, impairing the precise localization of irregular polyps; (2) during the calculation process, the high-frequency information that is crucial to local details is weakened, and the blurred mid-frequency information becomes dominant, thereby reducing the boundary accuracy. To overcome these limitations, we propose PolyMamba, a novel framework that integrates spatial priors while enhancing high-frequency information for more accurate polyp segmentation. Specifically, our framework introduces a Spatial-Prior Guided module, which leverages explicit spatial priors extracted from Transformer-based methods to counteract the local perception bias caused by Mamba’s fixed scanning pattern. Additionally, we design a Dual-Gate Frequency Enhancement module, which applies two Gaussian filters to generate spectra with different high-frequency thresholds, and uses the difference between them as an attention map to selectively enhance high-frequency features, thereby refining the polyp boundaries. Comprehensive experiments on five widely used polyp segmentation datasets demonstrate that PolyMamba not only surpasses existing state-of-the-art techniques but also provides a novel frequency-domain perspective, offering new insights into improving segmentation performance.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3872_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Parker-rfu/PolyMamba

Link to the Dataset(s)

https://github.com/DengPingFan/PraNet

BibTex

@InProceedings{FuRen_PolyMamba_MICCAI2025,
        author = { Fu, Renyu and Hu, Shurui and Zheng, Xiao and Tang, Chang and Liu, Xinwang},
        title = { { PolyMamba: Spatial-prior Guided Mamba for Polyp Segmentation with High-Frequency Enhancement } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15970},
        month = {September},
        page = {459 -- 469}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a novel approach that combines Mamba with two proposed modules: the Spatial-Prior Guide and Dual-gate Frequency Enhancement. The methodology draws upon principles from the transformer architecture, Mamba, and frequency domain analysis. Specifically, the frequency domain component is designed to preserve high-frequency information critical for accurate polyp segmentation.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The paper is clearly written and logically organized, making its contributions and technical details accessible to readers. (2) Novel Frequency-Driven Framework: The proposed methodology presents an effective framework for polyp segmentation by uniquely integrating frequency domain analysis, offering a fresh perspective to address challenges in capturing fine-grained structural details. (3) The paper identifies limitations of Mamba in segmentation tasks and innovatively proposes transformer backbone features to mitigate these issues. By further incorporating high-frequency preservation techniques, the method achieves enhanced segmentation performance, demonstrating both theoretical and practical rigor.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    (1) Technical Concerns:

    • Attention Mechanism Design: The paper proposes using a transformer backbone to generate attention maps but omits an attention gate. Direct multiplication of these maps risks suppressing unique modality-specific features if not balanced with normalization or gating mechanisms. The authors should explicitly address why this potential issue is not a concern in their framework. For example, an ablation study or theoretical analysis could justify the design choice, particularly in scenarios where modality-specific features might dominate or conflict. (2) Gaussian Filter Implementation: The description of the Gaussian filter requires clarification to enhance reproducibility and reader comprehension. Specifically:
    • What technical rationale supports the use of distinct high-frequency spectral boundaries as attention maps for enhancing high-frequency information?
    • How does averaging the Gaussian filter values (set to match the polyp image’s width/height) ensure compatibility between the two spectra? A more rigorous mathematical or empirical justification for this design would strengthen the methodology section. (3) Novelty and Originality: The paper lacks sufficient novelty. The proposed method bears striking similarity to Fang, X., Shi, Y., Guo, Q., Wang, L., & Liu, Z. (2023, August). Sub-band based attention for robust polyp segmentation. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (pp. 736-744). which introduced a sub-band attention (SBA) mechanism for robust polyp segmentation. Specifically:
    • The Spatial-Prior Guided Module appears functionally equivalent to the TAC module from prior work, with only a substitution of 2D convolutions with Mamba (SS2D).
    • The Dual-Gate Frequency Enhancement Module mirrors the SBA architecture in Fang et al. (2023), yet this foundational work is not cited. To justify publication, the authors must:
    • Systematically compare their Mamba-based approach against the original CNN-based method (e.g., Fang et al.’s IJCAI model) on the same dataset.
    • Demonstrate statistically significant and practically meaningful improvements (e.g., accuracy, efficiency, generalizability).
    • Explicitly discuss the advantages of Mamba over CNNs in this context (e.g., long-range dependency handling, computational trade-offs). Failure to address these points undermines the paper’s contribution and raises concerns about scholarly integrity.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (1) Strong Reject — must be rejected due to major flaws

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My primary concerns with this paper pertain to its novelty and originality . While the authors claim to propose new modules, the architectural design closely resembles prior work addressing the same application. However, this prior work is not adequately cited or discussed. To strengthen the manuscript, the authors should:

    1. Clearly articulate how their proposed methods differ from existing approaches.
    2. Provide proper citations to related studies to contextualize their contributions.
    3. Justify the originality of their work in light of the similarities to previous solutions.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    Despite specific concerns raised regarding architectural similarity to prior work and the absence of crucial comparative experiments, the authors’ rebuttal failed to adequately address these points, undermining the paper’s contribution and scholarly integrity.



Review #2

  • Please describe the contribution of the paper

    This paper proposes FreqMamba, a Mamba-based framework for polyp segmentation that addresses Mamba’s limitations in spatial perception and high-frequency suppression. The authors integrate spatial priors from a pretrained PVT and introduce a frequency enhancement module based on dual Gaussian-filtered FFT subtraction. The method achieves strong results on five public polyp datasets.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Strong performance across five datasets, including challenging unseen domains like ETIS and ColonDB.

    2. Clear motivation: addresses specific weaknesses of Mamba—local scanning bias and poor high-frequency retention.

    3. Innovative use of frequency cues, introducing a simple yet effective module for boundary enhancement.

    4. Comprehensive ablation confirms the individual contributions of spatial priors and frequency filtering.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Frequency filtering is heuristic: the use of fixed Gaussian kernels lacks theoretical grounding and is not compared to other frequency-aware methods.

    2. Missing comparisons with other Mamba-based segmentation models, such as SegMamba or Polyp-Mamba.

    3. No efficiency analysis: although Mamba is promoted as efficient, the added modules may offset this benefit, yet no FLOPs or inference time are reported.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    see strengths

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper presents FreqMamba, a framework for polyp segmentation that builds on the Mamba state space model. While Mamba effectively models long-range dependencies, it has been stated in this paper that it struggles with fixed spatial scanning and loss of high-frequency detail, both critical for accurate polyp boundary detection. To address this, the authors introduce a Spatial-Prior Guided Module using Transformer-derived priors and a Dual-Gate Frequency Enhancement Module to be able to selectively boosts high-frequency features.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1- The paper introduces two key modules—SG (Spatial-Prior Guided) and DFE (Dual-Gate Frequency Enhancement)—to overcome limitations of pure Mamba. The SG module leverages explicit spatial priors to reduce local perception bias, while the DFE module enhances high-frequency features for sharper boundary delineation.

    2- The model is extensively evaluated on five public datasets, showing good performance compared to some of the state-of-the-art methods.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper states: “As illustrated in Fig. 1(c), during the Mamba calculation process, high-frequency components (ranging from 322 to 352), which are critical for capturing fine local details, gradually diminish, while blurred mid-frequency components (ranging from 31 to 321) increase.”

    I would like to raise a few concerns regarding this key observation:

    Lack of Quantitative Evidence It is unclear how many samples or observations were analyzed to arrive at this conclusion. Was this behavior consistent across the entire dataset, or based on a limited set of examples? Given the significance of this finding—upon which the DFE module is designed—it would be helpful to include a quantitative analysis supporting this frequency behavior.

    Robustness to Imaging Conditions Have you considered how this observation might vary under different lighting conditions, which are common in clinical settings? Lighting variability could significantly impact the frequency spectrum, especially the high-frequency components associated with texture and edges.

    Dependency on Polyp Characteristics Is this observation consistent across polyps of varying sizes, shapes, textures, or anatomical locations? Without such analysis, it is difficult to generalize the assumption as a reliable design foundation.

    Given that this spectral behavior is a fundamental assumption underlying the architecture of the DFE module, I would strongly recommend including a more comprehensive evaluation—ideally with statistical metrics or visualizations—to support and generalize this claim.

    The paper provides meaningful comparisons with several existing methods; however, it lacks evaluations against some of the most recent and relevant works in the field. Notably, the following recent models should be considered for a more comprehensive benchmarking:

    SAM-Mamba: Mamba-guided SAM architecture for generalized zero-shot polyp segmentation

    Polyp-Mamba: Polyp segmentation with visual Mamba (MICCAI)

    In order to demonstrate the effectiveness and competitiveness of the proposed DFE module, it is important to include comparisons with these state-of-the-art architectures.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has more strengths than weaknesses, but some concerns still need clarification.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The paper has more strengths than weaknesses.




Author Feedback

  1. Comparison (r1, r2, r3): Following the same experimental settings, we conducted additional comparative experiments. Across five benchmark datasets, FreqMamba consistently outperforms the CNN-based SBA-Net (Fang et al.) and the Mamba-based ProMamba (Xie et al.), while showing slightly lower performance than the latest Polyp-Mamba (Xu et al.) and SAM-Mamba (Dutta et al.).
  2. Parameters and Running Time (r1, r2): FreqMamba contains 22.4M parameters and achieves real-time performance of 78 FPS on an RTX 3090, outperforming SBA-Net (24.3M, 53 FPS) and PolypPVT (25.08M, 48 FPS). In terms of complexity, FreqMamba requires only 8.37 GFLOPs (352×352 input), while SBA-Net and PolypPVT require 9.89G and 10.0G, respectively. Training time is also reduced: 32 minutes for FreqMamba compared to 1 hour for SBA-Net and 45 minutes for PolypPVT. These results suggest that FreqMamba is well-suited for real-time deployment with minimal compromise in segmentation performance.
  3. Major (r1): (1) In the ablation study, we added experiments showing the performance difference before and after applying a simple Sigmoid gating mechanism. Results indicate minimal impact, suggesting that Transformer-based attention maps already offer sufficient adaptiveness to spatial variations. (2) DFE mechanism: The feature is first transformed via FFT, then filtered by two Gaussian kernels with different high-frequency boundaries. Their spectral difference is treated as the high-frequency component and restored via inverse FFT to form an attention map. Gaussian filter design: We use a kernel size proportional to the width and height of the image. The size-adaptive kernel function design ensures that the spectrum results of the two filters in different frequency bands have the same spatial resolution and frequency domain range. (3) Novelty: While our method draws inspiration from SBA-Net, the underlying motivation differs. SBA-Net focuses on enhancing contextual richness via Transformer-driven CNN features, and uniformly regards high-mid frequency subbands as attention maps to enhance feature differentiation. In contrast, we focus on correcting the local perception bias in Mamba calculation by introducing spatial priors and retaining more local details from the perspective of high-frequency information enhancement. We emphasize performance and efficiency improvements over SBA-Net (see results in the previous sections).
  4. Heuristic frequency filtering (r2): We acknowledge that the proposed frequency filtering mechanism has a certain heuristic component, but we use a fixed but proportional Gaussian kernel design with the image size to balance computational efficiency and frequency domain selectivity. We have also added comparisons with DCT-based attention and Frequency Channel Attention (FCA). While these methods offer certain enhancements, our simpler difference-based Gaussian filtering achieves better segmentation accuracy with lower computational overhead.
  5. Spectral Behavior (r3):(1) Sample Size and Statistics: We sampled 500 polyp images from the training set and conducted spectral analysis. Results showed that in 87.6% of the samples, high-frequency components were significantly attenuated after Mamba (p < 0.01), while mid-frequency energy increased significantly (p < 0.05), supporting our characterization of spectral behavior. (2) Robustness to Illumination Changes: We applied ±20% brightness and contrast perturbations to simulate illumination variations. The attenuation trend of high-frequency components remained stable, with a Pearson correlation coefficient above 0.92, indicating strong robustness to imaging conditions. (3) Generalization Across Polyp Features: To evaluate generalization, polyps were grouped by shape complexity (perimeter-to-area ratio) and anatomical location. Consistent high-frequency attenuation was observed across all groups after Mamba, confirming the universality of the spectral behavior across diverse polyp appearances.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Although two of three reviewers are positive, this work has more weaknesses. Specifically, the reviewers asked to compare with more advanced mamba-based methods, but the authors in the rebuttal claimed that the proposed method has slightly lower performance than the latest Polyp-Mamba (Xu et al.) and SAM-Mamba (Dutta et al.), which largely lower the strengths of this work. Moreover, R1 still thinks that this work has architectural similarity to prior work and the absence of crucial comparative experiments. Hence, this work can not be accepted.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top