Abstract

Colonoscopy videos provide richer information in polyp segmentation for rectal cancer diagnosis. However, the endoscope’s fast moving and close-up observing make the current methods suffer from large spatial incoherence and continuous low-quality frames, and thus yield limited segmentation accuracy. In this context, we focus on robust video polyp segmentation by enhancing the adjacent feature consistency and rebuilding the reliable polyp representation. To achieve this goal, we in this paper propose SALI network, a hybrid of Short-term Alignment Module (SAM) and Long-term Interaction Module (LIM). The SAM learns spatial-aligned features of adjacent frames via deformable convolution and further harmonizes them to capture more stable short-term polyp representation. In case of low-quality frames, the LIM stores the historical polyp representations as a long-term memory bank, and explores the retrospective relations to interactively rebuild more reliable polyp features for the current segmentation. Combing SAM and LIM, the SALI network of video segmentation shows a great robustness to the spatial variations and low-visual cues. Benchmark on the large-scale SUN-SEC verifies the superiority of SALI over the current state-of-the-arts by improving Dice by 2.1%, 2.5%, 4.1% and 1.9%, for the four test sub-sets, respectively. Codes are at https://github.com/Scatteredrain/SALI.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2092_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Hu_SALI_MICCAI2024,
        author = { Hu, Qiang and Yi, Zhenyu and Zhou, Ying and Peng, Fang and Liu, Mei and Li, Qiang and Wang, Zhiwei},
        title = { { SALI: Short-term Alignment and Long-term Interaction Network for Colonoscopy Video Polyp Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes the SALI, which combines the Short-Term Alignment Module (SAM) and the Long-term Interaction Module (LIM), to improve video polyp segmentation (VPS) task. The key contribution lies in overcoming challenges such as spatial incoherence and low-quality frames by improving feature consistency and reconstructing reliable polyp representations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper proposed SALI for VPS, which is used to address the limitations of existing methods on adjacent frames of large variation and long sequences of consecutive low-quality frames.
    2. By improving the stability and reliability of spatiotemporal features, SALI demonstrates a targeted solution to common problems in colonoscopy video analysis.
    3. The SALI network demonstrates superior performance on the SUN-SEG dataset compared to existing state-of-the-art methods. The improvements in Dice scores across different test subsets highlight the effectiveness and competitiveness of the proposed approach.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1/ The authors’ claim is that “the stability and reliability of spatio-temporal features”, whereas this has not been quantitatively or qualitatively validated in the experiments. 2/ Unfair comparison. The performance gains would come from using the stronger backbone, PVTv2. It is therefore better to list the backbone, FLOPs, MACs, used in Table 1, and also to replace PVTv2 with another backbone for a fair competition.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The reviewer has checked the code link shown in the abstract. And its reproducibility is reliable according to the submission.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See the above comments

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea presented in this submission is good, but its effectiveness has not been fully evaluated due to unfair comparison. Thus, the reviewer rates this paper as weak rejection in the first round.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have well addressed my concerns. The reported performance looks very convincing. I am happy to change my mind to ‘accept’.



Review #2

  • Please describe the contribution of the paper

    The worksaims to design a robust video polyp segmentation by efficiently using adjacent feature consistency and reconstruct the same by Short-term and Long-term Interaction Module (LIM).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed methodology is novel and the experimental analysis.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    More experimental analysis on diverse datasets and condition is required to be used.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    More dataset and both qualitative and roubust quantitative analysis will be helpfull to understand the proposed technique better.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Due to novelity and provided unhustive experiments I remain to weak accept the paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have addressed my major concerns, hence I accept the work.



Review #3

  • Please describe the contribution of the paper

    The paper proposes a novel method called Short-term Alignment and Long-term Interaction Network (SALI) for video polyp segmentation in colonoscopy videos. It addresses the challenges of large spatial incoherence between adjacent frames due to fast camera motion, and consecutive low-quality frames caused by complex lighting conditions, which degrade the performance of existing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    his work tackles the specific and challenging application of polyp segmentation in colonoscopy videos, which has unique characteristics like fast camera motion, close-up observation, and variable lighting conditions.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Only SUN-SEG dataset were used to validate their model, it would be valuable to assess the practical implications of the improved polyp segmentation accuracy, such as its effect on polyp detection rates, diagnosis accuracy, or potential time savings for clinicians, to better understand the clinical feasibility and relevance of the proposed method.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Adding more tests on different datasets or, if obtaining datasets is challenging, testing against the true doctors’ performance can further demonstrate the model’s effectiveness.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper successfully demonstrates a functional model and assembles a new toolkit for Colonoscopy Video Polyp Segmentation.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

First of all, we would like to sincerely thank all reviewers for their constructive comments and all chairs for their great efforts. We express our heartfelt gratitude to the reviewers’ recognition on our work, for example, ‘novel method’ (R1), ‘challenging application, functional model, and new toolkit’ (R3), ‘good idea’ (R4), and so on. In the following, we will respond to all questions and concerns point by point.

Results on more datasets (R1,R3): Although we are very willing to include more datasets in our revised manuscript, we would like to give some key reasons why we think the SUN-SEG only is also sufficient in this study: (1) SUN-SEG is the well-known largest dataset for video polyp segmentation, and it provides 49,136 annotated frames. The other two comparable datasets, i.e., ASU-Mayo and PolypGen, only provide 3,856 and 3,788 annotated frames, respectively. SUN-SEG is 10+ times larger than them. (2) SUN-SEG is from the most comprehensive and realistic clinical scenes, such as fast camera motion, complex lighting, surgical instruments, diverse polyps, and so on. In comparison, the other existing datasets have relatively limited scenes. (3) Based on the above two consensuses, there are lots of top-notch works (PNS+ and MAST) also benchmarked on SUN-SEG only. Having said that, we agree that a public dataset has a certain limitation. In fact, we are already putting efforts on collecting a large-scale multi-center private dataset with local hospitals. In our future work, we will include and release the private dataset to make more contributions to the field.

Working condition (R1): Although our method achieves a new SOTA performance, it still has some limitations, for example, it makes false positive predictions on a few highly polyp-like sites such as ileocecal valve. Having said that, these limitations are challenging and shared by all polyp segmentation methods. In the future work, we will take them into consideration, and push the performance to a new extreme.

Clinically-practical values (R3): The clinically-practical contributions of our method include, but not limited to, (1) provide accurate edges to assist surgical resection; (2) provide accurate morphological information to assist polyp typing diagnosis. We agree that it is valuable to experimentally assess these contributions, but it involves lots of new efforts, which are hard to be squeezed into the MICCAI paper. In our further work, we will assess them comprehensively.

Backbone fairness (R4): Actually, our method uses the consistent backbone (PVTv2) with the two BEST comparison methods, i.e., MAST and SLT-Net. Thus, using PVTv2 in our method is still convincing for comparison. Having said that, we are willing to give the variant of using Res2Net-50 as backbone, which is consistent with the other four comparison methods, and list the corresponding MACs of all methods (see the code link).

Stability and reliability of the features (R4): Stability: The upper case of Fig. 3 indicates two adjacent frames with large spatial variations. As can be seen, the results of SALI are not only more accurate, but also more temporally stable (more similar across frames), which we believe can verify the stability of the learned features. Besides, we are going to include a specific metric of Temporal Stability [1], to quantitatively verify the stability. Reliability: The bottom case of Fig. 3 indicates consecutive low-quality frames, and the segmentation results on them are strongly correlated with the reliability of the learned feature. As can be seen, SALI’s results are closest to the GT masks. Besides, we are going to construct a specific sub-set of SUN-SEG containing all the consecutive low-quality sequences, and separately report the comparison results on the sub-set to quantitatively verify the reliability.

[1] A benchmark dataset and evaluation methodology for video object segmentation. ICCV. 2016.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper adopt simple component and achieveing amazing results.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper adopt simple component and achieveing amazing results.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top