List of Papers Browse by Subject Areas Author List
Abstract
In this work, we introduce EfficientMedNeXt—a lightweight, high-performance segmentation architecture developed through a two-phase optimization process applied to the MedNeXt architecture. To this end, we first optimize the decoder by reducing the high-resolution redundancy and unifying the decoder channels across stages for improved efficiency. Then, we introduce a new Dilated Multi-Receptive Field Block (DMRFB) to capture the multi-scale spatial context efficiently without increasing the kernel sizes and relying on the channel expansion convolutions. Extensive evaluations on BTCV, FeTA, and MSD show that EfficientMedNeXt-L achieves 87.0% DICE score on BTCV (+1.04% over MedNeXt-L) with 96.5% fewer parameters and 77.03% lower FLOPs. In addition, EfficientMedNeXt-S offers comparable DICE score, improved HD95, and 78.1% higher throughput while reducing parameters by 98.5% and FLOPs by 95%. These results demonstrate EfficientMedNeXt’s efficiency and accuracy, making it well-suited for real-world clinical applications. Our implementation is available at https://github.com/SLDGroup/EfficientMedNeXt.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4895_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/SLDGroup/EfficientMedNeXt
Link to the Dataset(s)
BTCV dataset: https://www.synapse.org/Synapse:syn3193805/wiki/217789
MSD dataset: http://medicaldecathlon.com/
FeTA2021 dataset: https://feta.grand-challenge.org/feta-2021/
BibTex
@InProceedings{RahMd_EfficientMedNeXt_MICCAI2025,
author = { Rahman, Md Mostafijur and Munir, Mustafa and Marculescu, Radu},
title = { { EfficientMedNeXt: Multi-Receptive Dilated Convolutions for Medical Image Segmentation } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15963},
month = {September},
}
Reviews
Review #1
- Please describe the contribution of the paper
The authors propose a more computationally efficient variant of the MedNeXt architecture for biomedical image segmentation with the following changes: the last upsampling layer can be skipped. The number of feature maps of the skip connections and the decoder pathway are aligned so that the feature maps can be added instead of concatinated. Use of dilated depthwise convolutions for the main feature extration blocks. The use of few feature maps throughout the whole architecture.
The efficiency in computation can be achieved without compromising accuracy, resulting in a superior architecture compared to MedNeXt.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The architectural changes compared to the baseline architecture MedNeXt are well motivated.
The evaluation shows strong results of the proposed architecture compared to common architectures in the field, both in terms of accuracy, and running time and memory footprint.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The biggest weakness of the paper is the description and/or design of the evaluation. The authors show a very comprehensive comparison with 11 competing architectures. However, it is unclear how the numbers were generated. The paper refers to the UX-Net paper for more details, but also this paper does not state how the number were generated. In particular, it is not clear if the presented numbers were taken from the respective papers, if pre-trained models were used for inference and the evaluation was carried out by the authors, if the networks were retrained by the authors using a publicly available implementation of the respective network architecture, or if the network architectures were reimplemented by the authors and integrated into a common training framework. Furthermore, it is not clear which version or variant was used for the evaluation. For example, the nnU-Net training framework comes in different version and also provides different variants (2D, 3D, hierarchical 3D). The numbers shown in Table 3 for nnU-net are significantly worse than the numbers provided by the authors of the nnU-Net framework on the same datasets in the respective papers. Which raises doubts about how well those numbers represent the actual performance of the nnU-Net and also all other compared methods.
The FeTA dataset was split into a training, a validation and a test set. But the other datasets were only split into training and validation. It is unclear how the validation and testsets were used. Was the testing done on the validation set if no dedicated testset was created? How was the validation set used and does using the same set for potential hyperparameter tuning and testing bias the evaluation?
The sizes of the respective validation and test sets are rather small. For instance, the test part of the FeTA dataset contains only 8 images. The training and validation part of the BCV dataset contains only 18 and 12 images, respectively and is relatively small compared to current datasets used for model training or segmentation challenges. It is possible, that the numbers refer to only one split of many splits and it is possible that the authors performed cross-validation but none of this was clearly stated in the paper.
What does it mean that the experiments were done without nnU-net’s adaptive training configuration. The main point of nnU-net is the automatic adaptation of hyperparameter, because differences in network performance are very often due to the hyperparameter choices and not due to new architectures. Performing the comparison without adaptive training configurations greatly limits the power of the nnU-Net approach.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Figure 1 is much appreciated. It helps alot with understanding how the different blocks of the architecture are formed and work together.
Why does Figure 2 not show results for nnU-net?
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The major weakness of the presented paper is the description of the evaluation. The authors build on a promising network architecture for medical image segmentation (MedNeXt) and iteratively improve its efficiency through a series of well motivated changes. This is backed up by a very comprehensive evaluation using 11 different methods. However, the description of the evaluation raises questions about how the numbers were created and why the numbers for some methods such as the nnU-Net are worse than the numbers provided by the authors of the nnU-Net on the same datasets. Furthermore, the design of the evaluation is not entirely clear and it seems that relatively few images were used for testing, which raises doubts about the statistical significants of the results.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
I appreciate the clarification on how the method was evaluated. I think it’s important for the reader to understand that the comparison in terms of accuracy has to be taken with caution, because the test sets are relatively small, no cross-validation was performed to at least be able to have every case participate as a test case, no standard deviation is given. The paper draws from the trust people might have in the MedNeXt architecture in terms of accuracy in improves the architecture with regards to efficiency.
I don’t want to overemphasis the analysis of the accuracy. Afterall, this is highly dependent on the segmentation task, the diversity of the dataset, the size of the dataset, and the tuning of all hyperparameters. I think the evaluation is good enough to show the potential as long as the reader fully understands the limitations of the evaluation and the efficiency gains compared to the MedNeXt architecture can be quite significant depending on the parameter choices.
Overall, I think the presented architecture shows an interesting alternative to existing architectures and the research community should learn about it and try it out in different settings to develop a better feeling about strengths and weaknesses of different architectures over time.
Review #2
- Please describe the contribution of the paper
The paper adresses the topic of efficiency for deep-learning-based medical segmentation. In it, a new architecture is proposed based on the MedNeXT architecture and evaluated using multiple public datasets.
Based on the experiments, the authors succeed in proposing a novel architecture that allows state of the art segmentation quality while significantly reducing the computational demand.
In order to achieve this major contribution, the authors propose different techniques to achieve this efficiency, most prominent a new structure for replacing conventional convolution layers.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Efficency is an important, yet underexplored, topic for deep learning, especially in medical imaging. So I really enjoyed seeing this being the focus of the paper.
In addition, I’d like to complement on the comparatively large number of experiments reported.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The paper has a few weaknesses, given below without a specific ranking:
- The paper ignores existing research for efficent implementation, solely focusing on medical image segmentation architectures in general.
- Most testing is done on rather small cohorts, leaving questions regarding the performance on large datasets.
- Performance is only evaluated for inference, not for training
- Measures for uncertainty are completely missing.
- Please confirm: All the technical adaptations are unique to this approach? If not, please cite the origins.
- Please also report the performance of smaller MedNeXT models .
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
I will try to clarify my previous statements on weaknesses and list a few more points that I observed. Again, without any specific order. Also, I am aware that not all points can be directly adressed in a MICCAI rebutal.
-
State of the art: There is a long (for MICCAI standard) introduction into the current state of the art for medical image segmentation algorithms. However, since the paper is more about efficency I would expect an introduction that also reflects on the progress made in this area. Right now, this complete line of reserach is left out completely.
-
Relate to 1. The long introduction leads to some claims that might difficult to hold. For example, there are multiple reasons to choose CNN-based over transformer-based beside the computational cost. Also 3D-CNN is not only an answer to transformer etc… Most of these claims are not necessary to justify this research, I strongly urge to revise it carefully.
-
Dataset size: All chosen datasets are rather small. While this reflects a common situation, it limits the evaluation w.r.t. the applicability of the network to larger datasets and possible trade-offs regarding the ability to learn from large datasets. I suggest to include at least one larger dataset.
-
Related to point 3. The authors report only the results for the largest MedNeXT model. However smaller models could provide a better performance especially on small datasets. In addition, those networks would require less compute.
-
A critical point is that the performance is only reported for inference. While this is already a strong contribution, it would also be interesting to see the trade-offs during training. Please report this too, even if there is no big improvement.
-
There are absolute no measures for uncertainty, like a standard deviation between multiple folds, a confidence intervall or even a test for significance. This gives the reader no chance to understand if the reported differences are meaningful, or probably just by chance.
-
Related to 6. The reported numbers are often with four or more significant digits. This suggest a precision/reliability of the results that is not justified by the setup. Especially the small number of samples used etc… It would be much better to just report three significant digits.
-
Own contribution? Related to the missing state of the art w.r.t. efficient deep learning. Are all improvements unique to the authors? I got the impression that all technical aspects in the methodical section are unique to the authors, could you please confirm or clarify this?
-
At some points, there results are repeated in the normal text. This could be shortened, as the numbers are already given in the result tables.
-
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This is generally a strong paper. However, I still see some points that should be adressed, mainly the small cohorts used, the reasons for not including smaller MedNeXT models, the performance during training, and the missing uncertainty claims.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
Overall, the authors adressed some of my concerns.While there are still some open points, I would give the authors the benefit of the doubth given the interesting aspect of the paper.
Review #3
- Please describe the contribution of the paper
This paper presents a lightweight improvement of MedNeXt with imporved segmentation performance and less parameters and computational complexity.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper presents a norvel light weight 3D segmentation neural network with new efficient encoder-decode architecture and dilated multi-kernel convolutional blocks.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1) The paper only compare the methods with MedNeXt-L-k5. The results with other variants of MedNeXt are not displayed. 2) It is unknown whether the training routine for other compared networks are optimal or not.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1) The paper only compare the methods with MedNeXt-L-k5. The results with other variants of MedNeXt are not displayed. 2) It is unknown whether the training routine for other compared networks are optimal or not.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
The experimental setting may be unfair for the compared neural networks
Author Feedback
We thank reviewers (Rx) and meta-reviewer (MetaR) for their constructive feedback. We address each issue below (without providing new results per MICCAI policies).
Evaluation Setup (R1, R2, MetaR): We retrained all 11 baseline models using their published architectures under the publicly available training framework and protocol from 3D UX-Net (see their Appendix A.1, Table 4, GitHub). This setup ensures fair architecture-level comparisons without framework-specific advantages. We manually tune all hyperparameters (e.g., learning rates) per each baseline’s original guidelines for optimal results. All implementation details will be provided in our final paper.
nnU-Net Usage (R2, MetaR): We evaluated the 3D Generic_UNet (base=48) from nnU-Net v1, without its adaptive pre/post-processing in order to maintain a uniform pipeline and compare only the architecture contribution. Our lower DICE scores compared to nnU-Net’s original paper are due to using: (1) standard dataset splits (BTCV: our 18/12 vs. nnU-Net’s typical 24/6) of TransUNet, (2) lower input resolution (96×96×96 vs. typical ≥ 128×128×128) following 3D UX-Net, and (3) uniform experimental protocol from 3D UX-Net for fair architecture-to-architecture comparisons. We agree that omitting nnU-Net’s adaptive pipeline may limit its performance. However, we believe that the incorporation of our EfficientMedNeXt into nnU-Net framework will further improve our model’s performance as well due to using adaptive preprocessing, training, and post-processing. We will integrate EfficientMedNeXt into nnU-Net’s pipeline and report the results too.
MedNeXt Variants (R1, R3, MetaR): We primarily report MedNeXt-L-k5 since EfficientMedNeXt directly optimizes this variant, and smaller MedNeXt variants exhibit comparatively lower DICE scores relative to their size. However, we will report results for the smaller MedNeXt variants too.
Dataset Sizes and Benchmarks (R2, R3, MetaR): For FeTA, we first select the best model based on validation DICE score and then evaluate generalization using the explicit test set. For BTCV (18 train, 12 test), we adopted the TransUNet split for fair comparisons. For MSD Brain Tumor (388 train, 96 val), Heart (16 train, 4 val), and Lung (51 train, 12 val), we applied an 80/20 split, using the validation set for early-stopping and DICE reporting, with hyperparameters set from FeTA to avoid bias. Although we didn’t use cross-validation, our chosen splits ensure unbiased comparisons with existing methods. Although our MSD Brain Tumor dataset has a total of 484 cases, to further enhance generalizability, we will report results on larger datasets (e.g., AMOS, TotalSegmentator).
Uncertainty and Significance (R2, R3, MetaR): We ran all experiments five times and reported mean DICE scores. We will also add standard deviations to show results variability and confirm robustness beyond random chance.
Efficiency-Focused Methods & Claims (R3, MetaR): Although we already discussed and compared several efficiency-focused architectures (e.g., UNETR++, SegFormer3D, SlimUNETR), we will restructure the introduction and remove unnecessary or overstated claims (e.g., CNN vs. Transformer comparisons).
Training Performance (R3): Given that inference efficiency is our core contribution, we report inference metrics. However, we will include training performance metrics too (epoch time, memory usage).
Novelty of Contributions (R3): The combination of our DMRFB block, unified decoder channels, and removal of the highest-resolution decoder stage is unique to EfficientMedNeXt and was specifically designed for a better accuracy–efficiency trade-off. To our knowledge, these innovations do not appear together in prior work.
Figure 2 (R2): We omitted nnU-Net results from Fig. 2 due to space constraints; we will include them in our final paper.
Result Reporting (R3): We will report all numerical results to three significant digits and remove repeated numerical results from the main text.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
The authors introduced a novel lightweight medical image segmentation model. The reviewer’s have mixed comments on this work, and the authors are invited to give more clarifications in the rebuttal: 1) How the values of existing works are obtained; 2) other variants of MedNeXt, including smaller models, are not shown; 3) Small size of datasets; 4) Why not using self-configuration of nnUNet; 5) Description of existing works on lightweight architecture design is insufficient in the introduction; 6) no measures for standard deviation.
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A