Abstract

Achieving high-precision medical image segmentation while maintaining computational efficiency remains a critical challenge for clinical applications. Existing methods often struggle to balance multi-scale feature fusion, lightweight design and contextual modeling, particularly for complex medical scenes with ambiguous boundaries. To address these limitations, We propose DyMAS-Net, a lightweight framework integrating multi-scale convolution, adaptive dynamic sampling, and dual attention mechanisms. Key innovations include: (i) Hierarchical Multi-Scale Convolution Block (HMCB) combining grouped depthwise convolutions with hybrid attention to capture cross-scale dependencies; (ii) Adaptive Dynamic Sampling Module (ADSM) that dynamically adjusts receptive fields through learnable position offsets and scope prediction, enabling context-aware upsampling with minimal computational overhead; (iii) Dual Attention Fusion Unit (DAFU) integrating channel-spatial attention for global context modeling and depthwise separable gating for local feature refinement. Extensive evaluations across 7 medical image segmentation tasks (breast cancer, thyroid nodules, skin lesions) show DyMAS-Net achieves state-of-the-art performance with an average Dice score of 87.19%, outperforming TransUnet and SwinUnet by 3.02% and 2.77%, respectively. Remarkably, it attains this with only 6.24M parameters and 8.87G FLOPs, 93. 3% fewer parameters than TransUnet. The framework’s efficiency-accuracy balance enables practical deployment in resource-constrained environments, thus promoting health equity.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3842_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{WanSiq_DyMASNet_MICCAI2025,
        author = { Wang, Siqi and Zhao, Qingxue and Wu, Di and Gao, Jiakang and Tian, Jun},
        title = { { DyMAS-Net: Dynamic Multi-Scale Adaptive Sampling Network for Efficient Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15970},
        month = {September},
        page = {99 -- 108}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes an efficient architecture for medical image segmentation involving three major components added to the Unet framework: a multi-scale convolution module, a module for dynamic upsampling, and a dual attention fusion unit. The proposed architecture obtains competitive results as compared to SOTA baselines and is more efficient.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is easy to follow.

    Extensive experiments with multiple baselines and datasets are presented, and the results are impressive.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The explanation of hybrid attention in the HMCB block (as mentioned in the introduction) is missing in the methodology section.

    Additionally, an ablation study of how this hybrid attention improves over standard multi-scale fusion seems necessary.

    A quantitative or qualitative analysis of how feature misalignment is reduced by ADSM and DAFU block is lacking.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper lacks studies to properly validate some of the design choices and proposed claims (see Weakness section). Additionally, clarity on which aspects of the proposed architecture are novel, in terms of design or usage, needs to be improved.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Thank you for the rebuttal, which addresses some of my concerns. I recommend acceptance.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a lightweight framework composed of three main components: a multi-scale convolution block (HMCB), a sampling mechanism for features (ADSM), and a dual attention fusion unit (DAFU). Experimental results show that the method achieves comparable performance with lower computational cost.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work effectively improves convolutional blocks by incorporating multi-scale features, and the proposed components work well together to enhance the overall efficiency of the model. These lightweight modules also appear to be generalizable to other existing convolution-based architectures.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Some novelty claims, such as HMCB, are overstated, as the concept of using different kernel sizes to enable convolutional blocks to capture multi-scale features has been widely explored in prior work (e.g., InceptionNet, RFBNet, MSUNet). Additionally, parts of the ablation study lack clear explanation.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. The concept of HMCB is not particularly novel, as many previous works have explored the use of multiple kernel sizes or depthwise separable convolutions (DWConv) to capture multi-scale features during convolutional processing. However, it is notable that the performance improvement becomes more evident when HMCB is combined with ADSM, suggesting a beneficial interaction between modules.
    2. In section 2.2, what is the G value in experiments?
    3. The explanation surrounding Table 2 is unclear. Specifically, it is confusing that the number of parameters increases significantly when HMCB is removed. One would expect that replacing a multi-scale block with a standard 3×3 convolution (e.g., removing the 1×1 and 5×5 scales) would reduce the parameter count. Please clarify what the backbone architecture looks like without HMCB.
    4. Why did the authors not compare with SegFormer-B0, a lightweight and widely adopted transformer-based baseline for segmentation?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper is well-structured, and the proposed components are clearly explained. The comprehensive experiments support the authors’ claims. While the design of each component is not particularly novel, the lightweight architecture is well-motivated and achieves sufficiently performance.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors addressed my questions well. Although the incremental changes in the individual modules may seem less appealing compared to the current trend favoring large models, the proposed method is solid from a methodological standpoint. It would be even better if the paper included a more detailed discussion or experiments on the design choices and variations within each module, but given the space constraints, the current level of detail is reasonable.



Review #3

  • Please describe the contribution of the paper

    The paper presents DyMAS-Net, a lightweight and efficient architecture for medical image segmentation. It introduces a combination of novel components, including a Hierarchical Multi-Scale Convolution Block (HMCB), an Adaptive Dynamic Sampling Module (ADSM), and a Dual Attention Fusion Unit (DAFU). These modules aim to improve context modeling, cross-scale feature fusion, and segmentation quality while maintaining computational efficiency. The method is validated on seven diverse datasets and demonstrates state-of-the-art performance with a strong balance between accuracy and resource efficiency.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper addresses a critical and practical problem in medical imaging—building models that are both accurate and computationally efficient, making them suitable for deployment in resource-constrained environments.
    • The architectural design is well thought out, with each component contributing to the network’s performance and efficiency. The modularity of DyMAS-Net enhances interpretability and reuse.
    • The evaluation across seven diverse datasets highlights the generalization capacity of the approach.
    • The ablation study, adds value by demonstrating the contribution of individual components.
    • The claimed improvements in Dice score are supported by relatively low parameter count and FLOPs, which aligns well with the goal of practicality and health equity.
    • The manuscript is well-written and structured, making it easy to follow and understand.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The claim that HMCB reduces computational complexity is not backed by any empirical or theoretical evidence. This should either be supported with appropriate data or reformulated.
    • The paper uses the term “significantly” in performance comparison without providing statistical analysis or uncertainty metrics (e.g., standard deviation, confidence intervals, p-values).
    • The discussion of limitations is missing. Addressing potential weaknesses (e.g., domain sensitivity, architectural constraints) and directions for future work would improve the scientific rigor of the paper.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • Please consider restructuring Table 1 for better clarity according to your grouping classification as described in the text. Adding visual cues for second-best results and clarifying metric directionality (i.e., whether higher or lower is better) would enhance readability.
    • Please Avoid terms like “significantly better” unless supported by statistical testing. Including standard deviation or confidence intervals would strengthen your claims.
    • Consider expanding the ablation study to cover all datasets or explain why a subset was selected.
    • A short section discussing limitations and possible extensions (e.g., adaptation to other modalities, performance under real-world noise conditions) would add depth and balance.
    • Overall, the work is of high practical value and technically solid, and I encourage the authors to push further on scientific rigor to fully showcase its strengths.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    DyMAS-Net is a well-executed, practically relevant contribution that balances efficiency and segmentation performance. The architectural innovations are meaningful, and the extensive evaluation across multiple datasets supports the generalizability of the method. While some aspects (e.g., efficiency claims, statistical analysis) could be better substantiated, these do not undermine the core strengths of the paper. The clear focus on lightweight design, potential for deployment in real-world settings, and alignment with health equity goals make this a valuable addition to the MICCAI community. I therefore recommend acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Thanks for the detailed and well-structured rebuttal. You addressed my main concerns effectively. The clarification around HMCB’s efficiency and the ablation setup was helpful, and the explanation of the design choice using depthwise separable convolutions adds credibility to the efficiency claims. Including this detail in the final version would definitely strengthen the paper.

    Also, I appreciate your openness to revising the wording around statistical significance and adding a limitations section. That will help improve the overall rigor and balance. While there’s still room for a bit more formal analysis, the consistent improvements across datasets already make a strong case for the method.

    Overall, DyMAS-Net is a solid contribution that’s well-executed, practical, and aligned with real-world deployment needs. My accept recommendation stands.




Author Feedback

Dear Reviewers, Area Chair, and Program Chairs,

Thank you for your valuable feedback on our paper. We address your insightful points below.

  1. HMCB Clarity, Efficiency, and Ablation Study (Addressing R1, R2, R3) Clarification of “Hybrid Attention” in HMCB (R1): Reviewer 1 sought clarity on “hybrid attention” within HMCB. We apologize for any confusion caused by the term’s initial mention. HMCB achieves multi-scale feature fusion through parallel depthwise convolutions with varying kernel sizes and channel shuffling. The “hybrid attention” in the introduction refers to the synergistic effect of these efficient mechanisms in capturing cross-scale dependencies and implicitly enhancing salient features. Dedicated attention modules (DSAG, CSAB) are explicitly integrated within our DAFU. Our ablation study (Table 2) demonstrates HMCB’s contribution: integrating HMCB improved the Dice score from 85.11% to 86.35% while reducing parameter count. HMCB Efficiency and Ablation Explanation (R2, R3): Reviewers 2 and 3 inquired about the empirical evidence for HMCB’s efficiency and the clarity of the ablation study. HMCB’s efficiency directly stems from using lightweight depthwise separable convolutions instead of heavier standard convolutions. This design choice reduces computational cost and parameters, as shown in our quantitative analyses. In Table 2’s ablation, the ‘X’ configuration for HMCB implies replacing our proposed HMCB blocks in the encoder with standard convolutional blocks (Conv, BN, ReLU) while maintaining similar layer counts and output channels. Standard convolutions inherently have a higher parameter count than our lightweight depthwise convolutions, explaining the observed parameter increase when HMCB is ‘removed’ in this baseline. Our ablation results consistently show that HMCB maintains strong segmentation performance while being highly parameter and computation efficient compared to conventional methods.

  2. ADSM and DAFU for Feature Misalignment (Addressing R1) Reviewer 1 requested analysis of ADSM and DAFU’s misalignment reduction. While we lack a dedicated quantitative metric, their effectiveness in handling feature boundaries and refining fusion is strongly supported by their individual contributions in ablation (Table 2). For instance, adding DAFU on top of HMCB boosted the average Dice score from 83.55% to 84.99%. This notable gain, with sharper boundaries in visual results (Fig. 2), empirically evidences their efficacy in mitigating misalignment and enhancing segmentation quality.

  3. Statistical Analysis and Limitations (Addressing R2) Statistical Analysis (R2): We acknowledge Reviewer 2’s feedback on lacking formal statistical analysis. While detailed tests weren’t included, consistent performance improvements across diverse datasets indicate a strong trend. We will revise wording for precision in future versions. Limitations Discussion (R2): We agree a dedicated limitations section enhances rigor. Potential limitations include further validation on broader medical imaging modalities (e.g., various MRI, complex CT) and assessing robustness under extreme artifacts/noise. We will add a concise limitations and future work discussion in the final version.

  4. Missing Comparisons (Addressing R3) Reviewer 3 inquired about the absence of a comparison with SegFormer-B0. While we did conduct comparisons with lightweight transformer models like SegFormer-B0, these results were not included in the submission due to space limitations.

  5. G value in Section 2.2 (Addressing R3) Reviewer 3 asked about the G value in Section 2.2. G, the number of groups in ADSM’s grouped convolution, was set to 4 based on empirical tuning during implementation.

We believe our responses address the reviewers’ main concerns and clarify the novelty and contributions of DyMAS-Net. We are committed to incorporating the suggested improvements in the camera-ready version.

Thank you again for your time and consideration.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper proposes an efficient architecture for medical image segmentation, which achieves competitive results and is more efficient than the SOTA baseline. The ablation experiments are very thorough and the results are convincing. The manuscript is well written, well structured, easy to follow and understand.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top