List of Papers Browse by Subject Areas Author List
Abstract
Colorectal cancer is a leading cause of cancer-related deaths worldwide, and precise polyp segmentation plays a crucial role in its early detection. U-shaped architectures are widely used for polyp segmentation due to their ability to capture multi-scale contextual information effectively. However, it is suboptimal to solely use top-down or bottom-up fusion flow in traditional U-shaped architectures. Additionally, most existing methods only focus on improving the feature fusion module, often introducing more computational costs. In this work, we propose a novel and efficient nested multi-scale feature aggregation network that integrates high-level semantic information with low-level boundary details within skip connections, effectively handling the diverse shapes and sizes of polyp regions. Specifically, we introduce a bidirectional FPN-in-FPN module that fuses features across stages through both bottom-up and top-down pathways. This module adds only 0.12M extra parameters with minimal computational overhead while significantly enhancing segmentation performance in small networks. Extensive experiments on polyp segmentation datasets demonstrate that our network outperforms existing methods in both accuracy and efficiency. Code is available at https://github.com/Yejin0111/FPN-in-FPN
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4899_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/Yejin0111/FPN-in-FPN
Link to the Dataset(s)
N/A
BibTex
@InProceedings{YeJin_FPNinFPN_MICCAI2025,
author = { Ye, Jin and Su, Yanzhou and Wu, Yicheng and He, Junjun and Zhuang, Bohan and Chen, Zhaolin and Cai, Jianfei},
title = { { FPN-in-FPN: A Nested Multi-Scale Aggregation Network for Polyp Segmentation } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15970},
month = {September},
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper proposes FPN-in-FPN: A Nested Multi-Scale Aggregation Network, which is another U-Net-based model. Instead of relying solely on top-down or bottom-up fusion, the paper introduces the FPN-in-FPN module to aggregate both bottom-up and top-down pathways. By stacking FPN-in-FPN modules between the encoder and decoder with multi-scale loss, the paper shows that their results outperform state-of-the-art approaches.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper proposes the FPN-in-FPN module, which aggregates both bottom-up and top-down pathways. By stacking only two modules and using VAN-B0 as the backbone, the results remain competitive in terms of both accuracy and efficiency.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Figure 2 does not explain the meaning of the numbers on the “dimension change” modules, such as “88×88.”
- The explanation of the decoder in Equation 2 states it uses Conv1×1, which does not match Figure 2, where Conv3×3 appears in “Decocer” before the Conv1x1 in “Deep Supervision”.
- Using VAN-B0 with low FLOPs (0.9G as shown in Table 4 of [3]) without comparison to other U-Net architectures using VAN-B0 as the backbone seems unfair. This raises concerns about the validity of the performance claims, as the improvements might be attributed to the VAN-B0 backbone rather than the FPN-in-FPN module itself. Furthermore, increasing N ≥ 3 leads to reduced performance, which questions the module’s scalability.
- There is no information about the implementation of the baseline. It is unclear whether “baseline” refers to the case where N = 0.
- The paper repeatedly mentions the 0.12M parameter count from the beginning but does not explain it until Section 3.5.
- Figure 4 is not referenced or discussed in the paper.
- The term “r” in “learnable Gaussian filter” is not introduced or explained earlier in the paper, making its meaning unclear.
- Please rate the clarity and organization of this paper
Poor
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- In the Top-Down Path, “UpSample” should be replaced with “UpSample(·)” for consistency with “DownSample(·)” used above.
- It would be better to add labels F , \hat{F}, \bar{F}, and F’ on the arrows in Figure 2 to improve clarity.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(2) Reject — should be rejected, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Based on the mentioned weaknesses above, the paper is still not in full hight quality.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
Thank you to the authors for taking the time to address all my questions and revise the paper.
Based on the results with larger backbones, the proposed method performs well on smaller backbones such as VAN-B0/1/2 or ResNet-18, showing clear improvements over UNet and PAFPN. However, when using larger backbones like ResNet-50/101, the performance gains are marginal—e.g., UNet (82.1/84.3), PAFPN (82.4/84.8), and the proposed method (82.7/84.7). This raises a concern: does the method not scale well with larger backbones, or is it particularly suited only for Polyp Segmentation? I believe this is an interesting open question that could be addressed in future work.
I also encourage the authors to consider open-sourcing their code to enable the community to further explore and validate their method.
Review #2
- Please describe the contribution of the paper
This paper proposes the FPN-in-FPN module, which is integrated into an encoder-decoder framework for polyp segmentation. The proposed method employs bidirectional feature fusion via bottom-up and top-down pathways to effectively combine high-level semantic information with low-level boundary details.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
(1) The paper proposes a plug-in module for multi-scale feature fusion that is lightweight yet effective in performing segmentation tasks. This work offers valuable insights and opportunities to develop efficient methods for polyp segmentation, addressing a critical need in the field. (2) The ablation study comprehensively examines the architecture design and includes experiments across multiple datasets, demonstrating the module’s robustness and generalizability. This rigorous validation strengthens confidence in the proposed method’s performance. (3) The paper is well-written and well-structured, with clear explanations and logical flow. The presentation of technical details and results is organized, making the contributions accessible to readers.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
(1) The proposed method is based on an FPN-based model originally designed for object detection tasks. While the architecture resembles the Path Aggregation Feature Pyramid Network (PAFPN) [Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2018). Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8759-8768)], it reverses the pathway sequence of PAFPN. To strengthen the manuscript, the authors are encouraged to: -Clearly articulate the differences between their method and existing FPN-based approaches (e.g., PAFPN).
- Explain how the proposed design is uniquely suited to the segmentation task compared to traditional FPN-based methods. (2) The decoder architecture is not sufficiently clarified. A more detailed description or diagram would help readers understand its design and functionality. (3) While the VAN backbone achieves comparable results to state-of-the-art (SOTA) models in its smallest configuration, the manuscript would benefit from: -Discussing the impact of integrating proposed modules with larger backbones.
- Directly comparing performance with recent SOTA methods for medical image segmentation (e.g., nnU-Net [Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211.]) to contextualize the proposed approach. (4) Figure 4 should explicitly label the model names alongside the predicted results and ground truth to improve clarity and interpretability.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed method introduces a simple, lightweight, and effective module designed to enhance performance in poly segmentation tasks, demonstrating both practicality and innovation. The core idea is compelling and holds promise for advancing the field. However, to fully justify its publication, the manuscript would benefit from a more rigorous technical discussion that explicitly compares the proposed approach with similar methods, highlighting its unique advantages and limitations. Additionally, ensuring figures are clearly labeled would significantly strengthen the work. These revisions would not only improve clarity and reproducibility but also better position the method within the broader context of segmentation research, thereby solidifying its contribution to the field.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
The authors’ rebuttal effectively addresses key performance concerns by providing strong quantitative results on SOTA comparison (nnU-Net) and scalability with larger backbones, which significantly strengthens the paper. Differences from existing FPNs are also clarified.
However, the rebuttal notably lacks sufficient technical depth in its explanations. Specifically, the decoder architecture remains entirely unexplained, and while performance with larger backbones is shown, there’s no technical analysis of why or how the proposed modules effectively integrate.
Review #3
- Please describe the contribution of the paper
The authors introduce a novel bidirectional module, named FPN-in-FPN, into a U-shaped architecture with bottom-up and top-down paths, enabling multi-scale feature aggregation. This approach is effective for polyp segmentation as it preserves both fine-grained details and high-level context, while maintaining low computational overhead.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Introduction: the authors provide an exceptional overview of U-Net based polyp segmentation methods that is concise, clear, and precise. They have categorized these methods into four categories and have discussed their limitations and drawbacks. They successfully explained the motivation for this work and the reasoning behind the proposed FPN-in-FPN method, simplifying the rest of the paper for the reader.
Real-time performance and practical applicability: It includes a clear and explicit analysis and discussion of the computational overhead involved in adapting this method. The authors compare it to other methods by examining metrics such as MAC, the number of parameters, and FPS.
An innovative and elegantly simple approach is presented: the method utilizes a nested multi-scale aggregation of semantic information, integrating both low and high-level features through top-down and bottom-up pathways. This design effectively captures complex features with minimal computational overhead, highlighting the novelty and practical applicability of their work.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The paper would benefit from including more comparisons to recent state-of-the-art polyp segmentation methods. I find that comparisons with approaches such as Polyp-mamba (Xu et al., 2024), PVT-Cascade (Rahman and Marculescu, 2023), and ASPS (Li et al., 2024) are notably missing. Including these comparisons would provide a more comprehensive evaluation of the proposed method’s performance in relation to current advancements in the field.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
My recommendation to accept the paper is based on its exceptional overview of U-Net based polyp segmentation methods, which effectively establishes the motivation for the FPN-in-FPN method. Additionally, the paper offers strong real-time performance and practical applicability through a detailed analysis of computational overhead. The innovative and straightforward approach of nested multi-scale aggregation, capturing both low and high-level features with minimal overhead, highlights the work’s novelty and practical value.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The novelty and main contributions are effectively demonstrated. The analysis of real-time performance with the computational overhead substantiates the contribution. A comprehensive comparison with state-of-the-art transformer-based methods and the Mamba method completes the results section.
Author Feedback
To Reviewer 1
- Comparison with FPN-based Methods Top-Down: Traditional FPN (e.g., UNet) limits the richness of features (discussed on pages 1-2), resulting in weak edge sensitivity as shown in Fig. 4 (2nd column). Bottom-Up: This direction is less explored in polyp segmentation. The typical PAFPN improves localization by propagating low-level, boundary-rich features, as edge information is a strong cue for accurate localization. However, as shown in A2 below, its performance remains sub-optimal compared to ours. Ours: FPN-in-FPN integrates both top-down and bottom-up fusion pathways, complementing the advantages of each. This dual-path fusion enhances both semantic and boundary features, which is critical for polyp segmentation (discussed on page 1). To our knowledge, it is the first polyp segmentation method to explicitly combine bidirectional multi-scale fusion.
- Impact of Large Backbones and nnU-Net Thanks for the suggestion. While our aim is to design an efficient and effective module (well-demonstrated with VAN-B0), we agree that providing results with larger backbones strengthens our work. We show the results of larger backbones and nnUNet, i.e., the average Dice across five datasets (mIoU excluded due to similar trends and space constraints): VAN-B0/1/2: UNet 84.9/85.6/86.3; PAFPN 85.0/86.3/86.4; Ours 86.5/86.7/87.5. ResNet-18/50/101: UNet 80.1/82.1/84.3; PAFPN 80.8/82.4/84.8; Ours 81.7/82.7/84.7. nnUNet: 81.1. These results show that our method performs competitively across different backbones and nnUNet.
- Writing Decoder: We will describe the decoder in detail and provide an updated illustration in Fig. 2. Fig. 4 labeling: Columns 1-7 are labeled as: Image, UNet, ResNet++, MSNet, M2SNet, Ours, and GT, respectively. We will make it clear.
To Reviewer 2
- Effectiveness of FPN-in-FPN To clearly demonstrate that performance improvements stem from our module, we compare it against strong baselines using the same VAN-B0 backbone: CaraNet and MSNet achieve average Dice of 85.3 and 85.4, respectively. In contrast, our FPN-in-FPN achieves 86.5 Dice, clearly demonstrating superior performance. Additional FPN-based comparisons in A2 to R1 further support the claimed effectiveness.
- Scalability of FPN-in-FPN We re-clarify our goal is to develop a lightweight yet effective module specifically for real-time polyp segmentation. The ablation of N is intended to identify the optimal trade-off between efficiency and effectiveness, rather than scaling indefinitely. Our findings show that N=2 provides the best balance for real-world applications, clearly aligning with our goals.
- Writing Dimension in Fig. 2: Channels double at each stage. We will illustrate it using VAN-B0 as an example. Equation 2: Thanks for catching the typo, Conv1×1 should be Conv3×3. Baseline definition: You are right. The baseline corresponds to N=0, i.e., traditional UNet. 0.12M: We clearly state it as the extra parameters introduced by our module throughout the manuscript, i.e., in the Abstract. Fig. 4: We will add labels (please refer to A3 to R1) and enrich figure captions accordingly. Typos for [UpSample(·), labels of “F”, and term “r”]: We will correct these typos.
To Reviewer 3
- SOTA comparison Thanks for pointing out the references. We acknowledge the importance of comparing our method with recent SOTA models. As our approach is based on UNet and designed for real-time applications, our comparisons focused on models with the same architecture and similar design goals, as recognized by you. The suggested methods (Polyp-Mamba, PVT-Cascade, ASPS) are based on Transformer or Mamba architectures and typically involve higher computational costs due to attention mechanisms. We agree these comparisons are beneficial. We plan to include them and provide structured comparison across architecture types and efficiency levels in future revisions, to better contextualize our method in the field of polyp segmentation.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Reject
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
This paper received weird reviews. R1 initially voted acceptance, mentioning that more experiments would have been needed (this should not have been said), but later switched to rejection because the authors provided the experiments R1 had asked for, which I find quite unfair. Then R2 shifted from plain rejection to acceptance, while at the same time mentioning that “ performance gains are marginal”, and R3 was all the time in favour of acceptance, arguing that the paper offers an “exceptional overview of U-Net based polyp segmentation methods”.
So, based on my own judgement of all this, I am going to vote for rejection, based on lack of clinical relevance. The reason is that polyp segmentation is an over-emphasized and a bit artificial problem. What we require in this context is detection, not segmentation, unless for some very specific applications. In this scenario, adding a couple of points of DICE score implies getting the polyp boundary slightly better, but if the original DICE score was already 82-84%, then the overlap is fully acceptable, and I do not see the relevance of yet another segmentation model to improve polyp overlap, sorry.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A