Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

3D models surpass 2D models in CT/MRI segmentation by effectively capturing inter-slice relationships. However, the added depth dimension substantially increases memory consumption. While patch-based training alleviates memory constraints, it significantly slows down the inference speed due to the sliding window (SW) approach. We propose No-More-Sliding-Window (NMSW), a novel end-to-end trainable framework that enhances the efficiency of generic 3D segmentation backbone during an inference step by eliminating the need for SW. NMSW employs a differentiable Top-k module to selectively sample only the most relevant patches, thereby minimizing redundant computations. When patch-level predictions are insufficient, the framework intelligently leverages coarse global predictions to refine results. Evaluated across 3 tasks using 3 segmentation backbones, NMSW achieves competitive accuracy compared to SW inference while significantly reducing computational complexity by 91% (88.0 to 8.00 TMACs). Moreover, it delivers a 9.1× faster inference on the H100 GPU (99.0 to 8.3 sec) and a 11.1× faster inference on the Xeon Gold CPU (2110 to 189 sec). NMSW is model-agnostic, further boosting efficiency when integrated with any existing efficient segmentation backbones.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0413_paper.pdf

SharedIt Link: https://rdcu.be/eHxeJ

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05325-1_36

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Youngseok0001/open_nmsw

Link to the Dataset(s)

N/A

BibTex

@InProceedings{JeoYou_No_MICCAI2025,
        author = { Jeon, Young Seok AND Yang, Hongfei AND Fu, Huazhu AND Kway, Yeshe AND Feng, Mengling},
        title = { { No More Sliding Window: Efficient 3D Medical Image Segmentation with Differentiable Top-K Patch Sampling } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {376 -- 386}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes a computationally efficient full-res CT/MRI segmentation framework without using the sliding window approach. A Differentiable-Top-K method is employed to selectively sample only the most relevant patches, thereby minimizing redundant computations. It aggregates the predictions from the selected patches with a low-res global prediction to produce the final full-res whole-volume prediction. Experiments demonstrated the effectiveness of the method.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. A novel computationally efficient framework to introduce the Differentiable-Top-K sampling method into CT/MRI segmentation framework;
2. An aggregation block to leverage coarse global predictions when patch prediction alone is insufficient.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The key part of the paper is the differentiable top-K patch sampling technique that selects only the patches most likely to enhance segmentation accuracy. However, this kind of techniques have been studied in high-resolution 2D classification tasks. Some related papers are missing in references. For example: Cordonnier, J. et al. Differentiable patch selection for image recognition, CVPR 2021.
2. In experiments, the methods the paper chooses to compare are too simple, only including Random Foreground and Zoom-out. In the CT/MRI segmentation, the accuracy of tissue/organ boundaries is crucial.The comparison with the boundary patch sampling method is necessary. In my opinion, it may be more effective to predict boundary patch in low-res volume and combine the high-res boundary patch segmentation with the coarse global prediction.
3. The authors say “When k=30, NMSW uses about 90% fewer MACs than SW” (page 7). But this doesn’t seem reasonable. SW performs segmentation on 300 patches and NMSW on 30 patches. With additional global prediction and aggregation, why does NMSW use about 90% fewer MACs than SW?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper proposes a trainable framework for the 3D segmentation task without using the sliding window approach. A Differentiable-Top-K module together with the combination of global prediction and local predictions are specially designed. However, the differentiable top-K patch sampling technique has been studied in high-resolution 2D classification tasks. And I don’t think top k sampling is very suitable for medical image segmentation. The experiments are also insufficient （see above）.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The response of the authors addresses most of my concerns. Based on the initial reviews and rebuttal, I am happy to raise my score to weak accept to align with the other reviewers.

Review #2

Please describe the contribution of the paper

This paper introduces the NMSW framework, which addresses the inefficiencies of conventional sliding window (SW) inference in 3D medical image segmentation. By incorporating a differentiable Top-k module and integrating both local patch-level predictions and global contextual information, the proposed method substantially reduces computational complexity while maintaining competitive segmentation accuracy. The reported improvements in inference speed on both GPU and CPU platforms further underscore the practicality and generalizability of the approach.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper targets a well-recognized limitation in 3D medical image segmentation: the high inference cost and latency associated with sliding window-based inference. This limitation has long hindered the practical deployment of high-performing 3D models in clinical environments.
2. The proposed NMSW framework is methodologically sound, model-agnostic, and easily integrable into existing 3D architectures without requiring architectural modifications. This flexibility makes it highly applicable in real-world settings.
3. The experimental evaluation is thorough and well-structured, covering three segmentation tasks and multiple backbone networks. These experiments convincingly demonstrate the generalizability and robustness of the proposed method.
4. The manuscript is well-organized, and the proposed approach is clearly articulated. Figures are informative, well-designed, and aid in the understanding of key contributions.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. While the paper provides strong quantitative evidence for the performance gains of NMSW, it would benefit from additional qualitative comparisons. Visualizing segmentation outputs from NMSW versus baseline SW inference could provide more intuitive insight into the differences in prediction quality.
2. The current experimental comparisons primarily focus on SW-based baselines. Nonetheless, including existing patch-free [1,2] or holistic decomposition [3] methods as baselines—as they have also removed the need for SW during inference—would better contextualize the contribution and underscore the advantages of the proposed framework.
[1] Patch-free 3D medical image segmentation driven by super-resolution technique and self-supervised guidance. MICCAI 2021. [2] Adaptive decomposition and shared weight volumetric transformer blocks for efficient patch-free 3d medical image segmentation. IEEE Journal of Biomedical and Health Informatics 2023. [3] Holistic decomposition convolution for effective semantic segmentation of medical volume images. Medical image analysis 57 (2019): 149-164.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is overall an interesting and meaningful work. However, as comparisons with existing patch-free methods are missing, the actual effectiveness of the proposed method somehow remains unclear.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The rebuttal did not fully address my primary concerns, particularly regarding the comparison with Holistic Decomposition (ref [3], MedIA 2019), which I had hoped to see. While this omission is somewhat disappointing, I acknowledge that the proposed technique may still offer practical value in certain 3D segmentation scenarios. Therefore, I am inclined to maintain my previous borderline rating, leaning toward acceptance.

Review #3

Please describe the contribution of the paper

The main contribution is the introduction of a novel, computationally efficient inference framework No-More-Sliding-Window (NMSW) for improving inference efficiency in 3D medical image segmentation. The proposed method produces a coarse global prediction from a low-resolution whole volume and samples high-resolution patches based on learned importance to refine the global prediction.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The differentiable patch-sampling technique seems interesting.

The proposed method also seems to work well with existing segmentation backbones like UNet, Swin-UNETR, and MedNext.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The authors present results up to K=30 patches, showing significant improvements. However, it would be informative if the authors could demonstrate or comment on whether performance continues to scale beyond K=30.

The manuscript lacks detailed implementation specifics regarding how exactly the global and local networks are trained and integrated during inference. For instance, while the authors mention that the global network receives a low-resolution input, it remains unclear precisely how this low-resolution volume is derived—whether via interpolation, cropping, or another downsampling method. Considering medical images have defined physical dimensions and voxel spacing, it would be beneficial for the authors to explicitly describe the strategy used to maintain consistency between low- and high-resolution inputs.

The authors could further strengthen their analysis by exploring how the number of segmentation classes (N) impacts the computational efficiency of the NMSW framework. Intuitively, a larger number of classes may increase the computational burden, particularly given the softmax operations used.

The authors claim their approach is not a new segmentation method but rather an inference optimization framework. However, the use of both global and local models for segmentation suggests that this approach indeed represents more than a simple adjustment of the sliding window technique. For example, nnUNet cascaded uses a similar network architecture for segmentation.

Since the proposed method introduces an additional global network, a fair comparison to baseline methods should include training time (GPU-hours).
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents an interesting and innovative approach for improving inference efficiency in 3D medical image segmentation. However, there are several weaknesses regarding implementation details, novelty clarification, computational impact of class numbers that should be addressed.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The author’s responses have addressed most of my questions.

Author Feedback

We thank the reviewers for their comments. We address key concerns below and hope our clarifications warrant a positive reassessment.

R1

Difference with high-res 2D classiﬁcation task

Not only is NMSW the first to apply top-k patch sampling for efficient 3D segmentation, but our top-k module also differs significantly from that of Cordonnier et al. Our module is specifically tailored for 3D segmentation. NMSW simulates top-k sampling using the Gumbel-Softmax trick with straight-through estimation, enabling hard one-hot sampling. In contrast, the referenced work uses perturbed maximization, which yields only soft-hot samples and results in blended patches. While such blending may be tolerable for 2D classification, it degrades the fine-grained features crucial for 3D segmentation. Our top-k module instead produces clean, unblended patches that are better suited for this task. Also, while the prior work uses only low-resolution inputs to predict sampling scores, NMSW predicts both sampling scores and a coarse segmentation mask. This coarse mask is then fused with patch-level predictions through a novel aggregation block—another key contribution of our work. Lastly, slow inference is a major challenge in 3D medical segmentation; NMSW achieves 9× speed-up via a simple model-agnostic, plug-and-play solution.

Boundary-based patch sampling as baseline

As suggested, we tested an edge-based sampling by applying Sobel filtering to the global prediction to generate sampling scores. On Word task with UNet, it attains Dice of [0.75,0.78,0.81,0.83,0.84] at K=[1,3,5,10,30,50]—only slightly better than RF and well below NMSW. We emphasize that NMSW learns an optimal sampling strategy—including edges—that dynamically adapts to varying segmentation tasks without relying on handcrafted heuristics like edge-based sampling.

Over-reporting of performance

MACs is computed with the python:thop and 90% better efficiency with k=30 is correct. The cost of global prediction (0.18 TMACs) and aggregation (0.07 TMACs) is negligible compared 30 local predictions (5.6 TMACs) in UNet. We will clarify this.

R2

Comparison with other efficient methods

We selected SW-based methods as baselines to highlight the value of learned sampling. NMSW stands out among efficient inference methods: it requires no backbone changes or complex loss functions, making it simple to integrate and broadly applicable. Unlike [1], which involves architectural modifications and super-res losses, NMSW remains simple. While [2] is backbone-agnostic, it has only been tested on simpler tasks (eg. femur, pancreas), questioning its robustness. In contrast, NMSW has been validated on complex, multi-organ segmentation and is expected to generalize well, leveraging proven backbones like MedNeXt and UNETR-Swin. We are happy to discuss on these works in the final version.

Qualitative results

Though the segmentation result is missing, Fig 5. shows the evolution of sampling strategy and its convergence to foreground regions. We are happy to add more visual comparisons in the camera-ready version.

R3

Scaling beyond k=30

As shown in Fig. 6, we tested NMSW with k=50 up to full patch sampling (>300 patches). NMSW scales effectively even for k>30 and often exceeds SW in performance when all patches are sampled.

Implementation detail

Low-res input is created using 3× trilinear downsize. Full-res volumes are resampled to the dataset’s median voxel spacing and zero-padded thereafter to be divisible by 32 (eg, 480×480×480).

Computational cost with more classes

NMSW’s inference cost is independent of class count. We assume the concern is about the softmax in Eq. (2). However, N refers to the number of patches, not classes. The cost of softmax with N around 300 is negligible.

Training time

Although NMSW increases training time by 30–40% depending on the backbone, it offers much faster inference than SW once trained, making the added training cost a good trade-off.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

I agree with all reviewers that the rebuttal clearly addresses all key concerns raised. The distinction between the proposed top-k patch sampling and prior 2D classification work is well-articulated, emphasizing the importance of unblended patch selection for 3D segmentation and the novel contributions of the aggregation block and coarse prediction. Clarifications on ablation studies, computation, qualitative results, and implementation details are sufficient.

I have one minor concern, it’ll be great if the paper can add the current SOTA for each dataset in Table 1. It’s unclear if SW (gold standard) is much worse than SOTA, if so, the top-k approximation results for SW (gold standard) is less convincing.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

No More Sliding Window: Efficient 3D Medical Image Segmentation with Differentiable Top-K Patch Sampling

Author(s):

Abstract

Links to Paper and Supplementary Materials

Link to the Code Repository

Link to the Dataset(s)

BibTex

Reviews

Review #1

Review #2

Review #3

Author Feedback

R1

Difference with high-res 2D classiﬁcation task

Boundary-based patch sampling as baseline

Over-reporting of performance

R2

Comparison with other efficient methods

Qualitative results

R3

Scaling beyond k=30

Implementation detail

Computational cost with more classes

Training time

Meta-Review

Meta-review #1

Meta-review #2

Meta-review #3