Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Transformer-based architectures demonstrate strong performance in medical image segmentation but face challenges due to computational redundancy and overparameterization, limiting their deployment in resource-constrained settings. This study identifies redundant computations at the block level, particularly in the deeper layers of transformer encoders, as well as in the token mixer and MLP within each layer, as quantified by cross-layer activation similarity. To operationalize these insights, we propose SlimFormer-3D, a lightweight U-shaped encoder-decoder framework that prunes redundant computations at a granular level. Using feature similarity metrics: Angular Distance and Centered Kernel Alignment (CKA), we locate minimally impactful layers and introduce gating factors to control token mixer and MLP module activations selectively. Experiments on BTCV, AMOS, and AbdomenCT-1K 3D abdominal CT datasets show SlimFormer-3D achieves competitive Dice scores while significantly reducing computational redundancy by 3.5x and cutting model parameters by approximately 83% compared to UNETR. Ablation studies confirm its balance between accuracy and efficiency, making it a promising solution for real-time 3D medical image segmentation.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1666_paper.pdf

SharedIt Link: https://rdcu.be/eHwOp

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04937-7_51

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/WHHHHY/SlimFormer3D-A-Layer-Adaptive-Lightweight-Transformer-for-Effcient-3D-Medical-Image-Segmentation

Link to the Dataset(s)

AbdomenCT-1K: https://abdomenct-1k-fully-supervised-learning.grand-challenge.org/ BTCV: https://www.kaggle.com/datasets/nguynhilonguetdhcn/btcv-data Amos: https://zenodo.org/records/7262581

BibTex

@InProceedings{HonYan_SlimFormer3D_MICCAI2025,
        author = { Hong, Yang AND Zhang, Lei AND Ye, Xujiong AND Mo, Jianqing},
        title = { { SlimFormer-3D: A Layer-Adaptive Lightweight Transformer for Efficient 3D Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15961},
        month = {September},
        page = {537 -- 546}
}

Reviews

Review #1

Please describe the contribution of the paper

The main contribution of this work is the development of SlimFormer-3D, a lightweight Transformer-based architecture for 3D medical image segmentation that addresses computational inefficiencies through layer-level pruning.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The motivation is clearly described and the paper is well organized.
2. The proposed method align well with the motivation.
3. The experiments are sufficient for supporting the overall idea.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The techinical contributions is quite limited. The gating mechanism used is very commonly viewed in NLP and CV fields, e.g, STABILIZING TRANSFORMERS FOR REINFORCEMENT LEARNING, can authors describe the differences with this paper?
2. The figures and tables are not organized well and decrease the overall readability.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The limited technical contribution.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The authors deal with a up-to-date issue of eliminating redundancy in deep learning architectures while keeping performance similar to existing levels “How can redundancy be accurately identified at a finer, layer-level scale and effectively pruned while preserving model efficiency and performance?”.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Authors shift the focus of network optimization toward pruning and refining the Transformer’s internal structure, proposing a lightweight architecture SlimFormer-3D.

They show that transformer structure matters more than the attention mechanism, as performance remains nearly unchanged when replacing Attention with alternatives.

Three publicly available 3D abdominal CT datasets were used: BTCV, AMOS and AbdomenCT.

Results show that it retains high segmentation performance while reducing parameters and processing significantly.

ablation experiments present also
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The approach is focused in pruning the network, therefore it would be important to have a more extended section discussing and analyzing the competing approaches and discussing the results regarding pruning approaches. That would include an extended section 3.3, it is too small. It would alos be nice to have a more in-detail explanation of alternative mechanisms to prune.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The mechanisms and achievement of size reduction for the network architecture are significant. On the other hand, I would suggest more discussion on pruning mechanisms, including the competing alternatives, and a more detailed section 3.3 with discussion and analysis with competitors.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The authors propose a SlimFormer-3D, a lightweight transformer for 3D segmentation, based on layer-level pruning of redundant computations in the vanilla transformer architecture. Their experiments across BTCV, AMOS, and AbdomenCT-1K datasets demonstrate comparable performance to existing approaches, while using only a fraction of the parameters.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The authors perform extensive empirical analysis for detecting layer-level redundant computations that motivate their choice of architecture.
- The proposed method performs comparably to SOTA methods despite significantly reducing number of parameters across three 3D segmentation datasets.
- The inclusion of ablation experiments further strengthens the findings.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Further comparisons with SOTA methods [1] and other layer adaptive pruning strategies [2,3] are required to strengthen the paper’s claims. It is would interesting to see how the proposed method compares with other pruning techniques in literature.
- It is not clear if the findings would generalize to the nth transformer layer having more redundancy than (n-1)th layer for a block with n>2 layers. The authors conduct empirical analysis assuming that n=2 transformer layers per block. To ensure the findings generalize to other configurations, further analysis is necessary.
- The authors should consider including statistical comparisons (e.g., t-test) between baselines and proposed method to strengthen the findings.
[1] Zhou, H. Y., Guo, J., Zhang, Y., Han, X., Yu, L., Wang, L., & Yu, Y. (2023). nnFormer: volumetric medical image segmentation via a 3D transformer. IEEE transactions on image processing, 32, 4036-4045. [2] Lin, X., Yu, L., Cheng, K. T., & Yan, Z. (2023). The lighter the better: rethinking transformers in medical image segmentation through adaptive pruning. IEEE transactions on medical imaging, 42(8), 2325-2337. [3] Perera, S., Navard, P., & Yilmaz, A. (2024). Segformer3d: an efficient transformer for 3d medical image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4981-4988).
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Despite limited comparisons with existing pruning techniques, the authors tackle an important challenge of computational efficiency by exploring layer-level redundancies in vanilla transformer architectures and proposing a new lightweight transformer. I believe findings are highly relevant.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank reviewers for their insightful comments and thoughtful suggestions. Your feedback has provided us with valuable insights to improve the manuscript. Q1: Generalize to other configurations (n>2)(R2) A1: Thank you for the suggestion. We believe the same phenomenon applies to Transformers in segmentation tasks when using more than two layers (n > 2) within each block. Our ablation studies show clear evidence of redundancy with two Transformer layers (n = 2). Based on this observation, we infer that as additional layers are stacked beyond the second layer within a Transformer block, redundancy naturally accumulates. This is further supported by the study cited as [4] in our Introduction, which demonstrates that large language models exhibit increasing redundancy in deeper layers when n > 2. Q2: Differences with STABLIZING TRANSFORMERS FOR REINFORCEMENT LEARNING(STFRL)(R3) A2: Thank you for suggesting the paper. We have carefully reviewed it and compared it with our method. While STFRL focuses on the stable optimization of Transformers, our approach is motivated by the goal of balancing efficiency and computational cost. We show that redundancy exists across various Transformer modules. including Blocks, MLPs, and Attention layers using a feature similarity-based metric. The two methods differ in both objective and strategy. Our method aims to improve the efficiency of ViT by removing redundant weights through a pruning strategy, where the pruning decision could function as a static gating mechanism. In contrast, the gating mechanism in Gated Transformer-XL (GTrXL) is designed primarily to stabilize training by adding an additional gating layer rather than to reduce computational overhead. Q3: The figures and tables are not organized well(R3) A3: We have updated Fig. 1, Fig. 2, and the tables to improve clarity, optimize layout, and enhance overall illustration quality.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

SlimFormer-3D: A Layer-Adaptive Lightweight Transformer for Efficient 3D Medical Image Segmentation

Author(s):