Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Mamba-based architectures have shown promising performance in medical image segmentation. Accurate segmentation demands effective capture and integration of both global context and local details. However, existing methods often lack a balanced approach to extracting and fusing global and local information within the encoder and decoder. To address this issue, we introduce Global-Local Vision-Mamba with Semantic Fusion Network (GLM-SFNet), which is designed for balanced global-local feature processing in medical image segmentation. In the encoder, GLM-SFNet employs a Local-Global Vision State Space block (LGVSS). LGVSS strategically integrates four-directional scanning Mamba to capture comprehensive global context while incorporating Learnable Descriptive Convolution (LDC) to ensure detailed local feature extraction. For the decoder, we propose a Semantic Fusion Decoder (SFD), which achieves enhanced information integration and boundary precision by strategically combining global and local semantic fusion modules. Extensive experiments on three benchmark datasets demonstrate that GLM-SFNet achieves state-of-the-art segmentation performance while maintaining a lightweight architecture.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2666_paper.pdf

SharedIt Link: https://rdcu.be/eHwPI

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04947-6_22

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{CheJia_GLMSFNet_MICCAI2025,
        author = { Chen, Jiahui AND Qi, Fei AND Chang, Chengyuan AND Hu, Qinjie AND Fu, Kaiwen AND Wang, Xiaotian AND Liu, Kun},
        title = { { GLM-SFNet: Global-Local Vision-Mamba with Semantic Fusion for Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15962},
        month = {September},
        page = {224 -- 234}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper presents a MambaVision-based method with global-local feature fusion, for skin lesion and abdominal organ segmentation.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) The method based on MambaVision is modified to apply to medical image segmentation. This is a good translation for 3D medical data segmentation, which can be recognized as a sequence of images. (2) The ablation studies is comprehensive.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

There are still some problems: (1) For global-local feature fusion learning, there are many previous studies, which is contrary to what the authors declare. Thus, the description of these relative works is weak and lopsided. (2) The application is chose in two Irrelevant fields, that are skin lesion and abdominal organ segmentation, which seems a little weird. (3) The SOTA results of the specialized methods (not computer vision foundation model) on included three datasets may be not shown.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The translation is good but the SOTA results of the specialized methods (not computer vision foundation model) should be compared.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The main contribution is the proposal of a modified Mamba architecture for image segmentation which aims at improving local boundary segmentation without loosing information about global dependencies. Encoder and decoder are designed such that information can flow from early and late state space blocks in the encoder to corresponding fusion blocks in the decoder.

The approach has been evaluated on open access CT organ segmentation as well as on skin lesion segmentation datasets where it performs mostly above the state of the art, especially for challenging structures.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- the work presents several interesting modifications to Mamba. The benefits of the proposed modified pathways and improved information flow between encoder and decoder have been evaluated in an ablation study and performance improvements have been shown
- the results on the open access datasets are very good, performing mostly above the state of the art
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The problem of local-global feature balance for medical image segmentation could be better motivated with concrete examples and references.
- The performance evaluation and ablation studies are focussed on the Synapse dataset which makes sense as it includes a variety of organs of different complexity, boundaries, and shape. I think the evaluation on the 2 ISIC datasets does not add much to the work as there are no qualitative results and no motivation why these datasets are relevant to demonstrate the benefit of local-global feature balance. It would be good to add examples from these datasets and highlight advantages/limitations of the method, e.g. for lesions with complex patterns.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- Figure 3: any differences between the methods are hard to spot and need strong zooming in to see. There is a lot of empty/non-relevant space between the examples which could be cropped and the examples could vice versa been made larger or cropped to the relevant parts
- underlining second best approaches in tables 1 and 2 would help readability
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I think the paper presents some interesting modifications to Mamba architectures as a better balance between local and global features makes sense for many image segmentation applications. There is a strong evaluation on Synapse with good results and ablation studies and the paper is well written and understandable.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The authors provide a new methodological contribution, Global-Local Vision-Mamba with Semantic Fusion Network (GLM-SFNet). This algorithm is developed for its use on medical image segmentation tasks and combines global and local representation of images in a Mamba-like configuration that addresses long-range dependency with a convolutional branch for local feature extraction. Local and global features are then run through a semantic fusion network that uses cross-attention mechanisms and provides an efficient integration of encoder-decoder structures whilst keeping a good segmentation performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors present a relevant new and interesting methodology for medical image segmentation by combination of cross-attention-based semantic fusion of global and local features and a Mamba structure. Their results exhibited a good accuracy whilst keeping a light architecture - which can have a positive impact on clinical settings with limited computational resources.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
No major weaknesses were observed in the submitted work, although here are some minor comments:
- The contents of the paper are complex, and the over-use of abbreviations can make it hard to follow sometimes (for example, section 3.2)
- The benefits of the algorithm in clinical settings are under-addressed and a paragraph highlighting the potential for computer-aided diagnosis could add value to the paper.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

If the code will be made available upon publication, authors might want to add this to the manuscript. Despite the details of the methodology section, the architecture is quite complex and potentially hard to reproduce.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(6) Strong Accept — must be accepted due to excellence
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The new framework including cross-attention-based semantic fusion of global and local features using a Mamba-like configuration has exhibited good accuracy whilst keeping a light architecture. Their writing and discussion are of high quality and the contribution presents a novel approach that can be of interest for the MICCAI community.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We sincerely thank all reviewers for the valuable comments and constructive suggestions. We greatly appreciate the time and effort dedicated to reviewing our work. Below, we provide clarifications and responses to the main points raised during the review process.

Q1: Balance of global-local features for medical image segmentation (R1, R2) A1: Our discussion on global-local feature balance primarily focuses on recent segmentation models based on the Mamba architecture, such as SliceMamba, VM-UNet, and UltraLight VM-UNet. While these models offer efficient global modeling capabilities, they still show limitations in preserving local spatial structures and effectively integrating global and local information. These challenges serve as the main motivation behind the design of our proposed method. Our goal is to address specific limitations observed in emerging Mamba-based models, while fully acknowledging prior contributions beyond this category.

Q2: Dataset selection and relevance (R1, R2) A2: We selected the Synapse dataset (abdominal multi-organ segmentation) and the ISIC dataset (skin lesion segmentation), which differ significantly in clinical context, to evaluate the generalizability and robustness of our method across heterogeneous medical tasks. Although the segmentation targets vary, both datasets require accurate modeling of fine-grained local structures under global semantic constraints. This shared challenge directly aligns with the motivation behind our approach.

Q3: Comparison with specialized SOTA methods (R1) A3: We have conducted comprehensive experiments on three public benchmark datasets, covering models based on CNN, Transformer, and Mamba architectures. The baselines include PVT-EMCAD-b0, VM-UNet, UltraLight VM-UNet, MISSFormer, among others. These representative approaches reflect current advanced methodologies and serve as solid references for evaluating the effectiveness of our proposed GLM-SFNet.

Q4: Figure Presentation and Language Refinement (R1, R2, R3) A4: We will make revisions in the camera-ready version.

Q5: Source code availability (R3) A5: We plan to publicly release the source code upon formal acceptance. In the meantime, the code can be provided upon request by contacting the corresponding author.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

All the three reviewers are positive to accept this work and their ratings are ‘weak accept’, ‘weak accept’, and ‘strong accept’. Three reviewers think the method based on Mamba architectures is a better balance between local and global features for many image segmentation applications and the experimental results is comprehensive. The authors are suggested to further revise this paper based on the reviewer comments for preparing the final paper.

back to top

GLM-SFNet: Global-Local Vision-Mamba with Semantic Fusion for Medical Image Segmentation

Author(s):