List of Papers Browse by Subject Areas Author List
Abstract
In the domain of 3D biomedical image segmentation, Mamba exhibits the superior performance for it addresses the limitations in modeling long-range dependencies inherent to CNNs and mitigates the abundant computational overhead associated with Transformer-based frameworks when processing high-resolution medical volumes. However, attaching undue importance to global context modeling may inadvertently compromise critical local structural information, thus leading to boundary ambiguity and regional distortion in segmentation outputs. Therefore, we propose the HybridMamba, an architecture employing dual complementary mechanisms: 1) a feature scanning strategy that progressively integrates representations both axial-traversal and local-adaptive pathways to harmonize the relationship between local and global representations, and 2) a gated module combining spatial-frequency analysis for comprehensive contextual modeling. Besides, we collect a multi-center CT dataset related to lung cancer. Experiments on MRI and CT datasets demonstrate that HybridMamba significantly outperforms the state-of-the-art methods in 3D medical image segmentation.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2815_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
N/A
Link to the Dataset(s)
N/A
BibTex
@InProceedings{WuWei_HybridMamba_MICCAI2025,
author = { Wu, Weitong and Xing, Zhaohu and Gong, Jing and Peng, Qin and Zhu, Lei},
title = { { HybridMamba: A Dual-domain Mamba for 3D Medical Image Segmentation } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15962},
month = {September},
page = {284 -- 294}
}
Reviews
Review #1
- Please describe the contribution of the paper
This paper introduces a 3 D UNet style encoder–decoder in which the Mamba state space layer is dual enhanced: Slice local Mamba (S LMamba) block couples Slice oriented Mamba (full slice forward + reverse scan) with Local oriented Mamba (small sliding window scans within and across slices) to balance global context and fine boundaries, An FFT Gated Mechanism (FGM) fuses learnable high /low frequency components with spatial features before Mamba processing, aiming to bolster robustness on low contrast or noisy images. Experiments on two medical image datasets demonstrate the effectiveness of the method.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The combined SoMamba + LoMamba scans keep long range context while tightening locality around lesions, reducing boundary leakage.
- Frequency–spatial fusion via FGM offers an orthogonal cue (edge/shape in frequency domain) and is shown to boost Dice by +2.6 pp on the lung set in ablation
- New CT dataset (central type lung carcinoma) highlights the model’s ability on tiny lesions and may enrich future research once released.
- Experiments on two medical image datasets demonstrate the effectiveness of the method.
- An ablation study was conducted to verify the effectiveness of the key components.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Incremental novelty: SoMamba/LoMamba derive directly from LocalMamba [9] and SegMamba’s tri orientated scan; FGM resembles prior FFT/wavelet gating [23]. The paper lacks a convincing argument that the combined design is fundamentally new.
- Private lung dataset unreleased: reproducibility and independent verification are impossible; annotation protocol and inter rater reliability are not reported.
- Evaluation breadth is limited: only Dice + HD95 are given; no statistical significance, precision/recall, or computational metrics (FLOPs, GPU memory, inference time) to support the “efficient” claim.
- Scope of datasets: only brain MRI and lung CT are tested; generalisation to other organs/modalities or multi label tasks remains unverified.
- The computation cost is not discussed (e.g., number of parameters, Flops, etc.)
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
In sum, HybridMamba is a well engineered refinement of state space segmentation models that delivers solid empirical gains and an interesting frequency domain gate, yet its incremental nature and limited analysis leave open questions about generality, efficiency and reproducibility. I suggest weak reject.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The authors address most of my concerns.
Review #2
- Please describe the contribution of the paper
a) Hybrid Spatial-Frequency Learning through FFT Gated Mechanism (FGM) Presents a new FFT Gated Mechanism (FGM) for combining spatial and frequency features.Augments edge details and masks noise, boosting segmentation accuracy.
b) Long-range Dependency Modeling with Efficient use of Mamba Variants Presents Slice-oriented Mamba (SoMamba) to model global dependency in 3D medical images.Introduces Local-oriented Mamba (LoMamba) to enhance local spatial details to achieve improved segmentation.
c) Linear Computational Complexity for Large-scale 3D Medical Images Maintains linear complexity of computation, enabling scalability to volumetric datasets of large size.Memorandum and computational requirements are lower in comparison to Transformer-based models.
d) Practical Deployment Feasibility Closes the performance gap between deep learning models and practical real-world applications of medical imaging. Provides a light, fast, and resource-constrained alternative compared to Transformer-based models.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
a) Dual-domain Feature Fusion with FFT Gated Mechanism (FGM) HybridMamba efficiently fuses spatial and frequency information, which is vital for medical image segmentation. The FFT Gated Mechanism (FGM) strengthens significant structural information by selectively processing the frequency components while keeping relevant spatial information intact. This provides enhanced edge preservation, minimizes noise, and enhances segmentation accuracy.
b) Effective Long-range Dependency Modeling with SoMamba & LoMamba Rather than depending upon computationally costly self-attention, HybridMamba leverages state-space models (SSMs) for effective handling of long-range dependencies. SoMamba addresses global dependencies, whereas LoMamba augments local spatial details. HybridMamba, as a result, is able to improve segmentation quality without Transformers’ high resource consumption.
c) High Computational Efficiency and Scalability One of the strongest aspects of HybridMamba is its linear computational complexity, which allows it to scale very well for big 3D medical datasets. In contrast to Transformer-based architectures that use enormous amounts of GPU resources, this approach balances performance, efficiency in memory, and processing speed, and thus it is practical for actual medical imaging usage.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Lack of Proper Justification for Architectural Decisions: The Slice-oriented Mamba (SoMamba) + Local-oriented Mamba (LoMamba) architecture is not strictly compared with other global-local fusion approaches such as Fourier Neural Operators (FNOs) and Wavelet-based Transformers. The advantages of Mamba over SwinUNETR and UNETR are not properly justified beyond efficiency arguments.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper presents a strong methodological contribution with its hybrid SoMamba-LoMamba framework and spatial-frequency learning approach, demonstrating competitive segmentation performance on BraTS2023 and a lung cancer dataset. However, the lack of novelty in its application, insufficient architectural justification raise concerns. A strong rebuttal addressing novelty, efficiency, and broader validation could justify its acceptance.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
The author’s justification regarding the benefits of Mamba over Transformer based architectures is not satisfactory. However, the work had novel elements in it. I would like to give a weak reject, but as I am unable to see that option, I have given a Reject decision.
Review #3
- Please describe the contribution of the paper
The article introduces a variant of the Mamba approach that incorporates a scanning procedure along with a spatial frequency module based on the Fast Fourier Transform. The primary Mamba module is derived from the SegMamba architecture, with modifications to the patching method to better support local dependencies. Additionally, the authors propose an attention-like module that leverages the Fast Fourier Transform to capture frequency information and enhance boundary modeling. A comprehensive validation procedure is provided, featuring comparisons with state-of-the-art methods and extensive ablation studies.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The proposed Local and Slice Mamba, which pertains to the scanning procedure, is an intriguing innovation designed to address the challenge of applying a one-dimensional Mamba structure to three-dimensional patches. Furthermore, the use of frequency-based attention is well explained in the context of its application to CT and MRI images, leading to enhanced results. An extensive evaluation, including comparisons with state-of-the-art methods and an ablation study, is provided, with the performance enhancements being most notable in the lung tumor dataset, as this method effectively models local patches.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Figure 1, which provides an overview of the methodology, is presented in an overly complicated manner; a more detailed explanation of the high-level steps would significantly enhance its clarity. In the Results section, the comparisons on the BRATS dataset indicate that the proposed method achieves metrics very similar to those of existing approaches, suggesting that the benefits in this dataset are marginal, while improvements in datasets such as lung are more pronounced. This discrepancy should be discussed in greater detail. Moreover, the observed small differences are insufficient to support the claimed superiority of the method in the BRATS dataset. Including statistical tests (e.g., t-test, Wilcoxon test) would enhance the robustness of the claims. Additionally, during the comparisons with state-of-the-art methods, only SegMamba is evaluated, and other mamba-based methodologies, such as UMamba, are omitted, which weakens the comparative analysis, given that these methods are based on mamba and are widely used.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Overall, the article provides implementation details for the training of the models; however, certain aspects of the lung dataset, such as acquisition, spacing, and size, are not addressed. Additionally, the authors should consider providing the source code for the methodology to further enhance reproducibility.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper introduces several innovative contributions, such as the Local and Slice Mamba approach and a frequency-based attention to enhance CT/MRI imaging results. The extensive evaluation, featuring comparisons with state-of-the-art methods and a detailed ablation study, is particularly compelling in the lung tumor dataset, where modeling local patches significantly improves performance. However, the overall impact of the work is diminished by a few issues: the overview in Figure 1 is presented in an overly complicated manner, the BRATS dataset comparisons reveal only marginal improvements without rigorous statistical validation, and the exclusion of evaluations against other mamba-based methodologies (such as UMamba) weakens the comparative analysis. In summary, some enhancements in the presentation of the method overview and the results would significantly improve the paper.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The authors have adequately addressed all of my concerns. The proposed Mamba-based architecture appears to contribute meaningfully to the existing body of work on state-of-the-art Mamba networks by improving their accuracy in segmenting smaller regions of interest, such as lung tumors, an aspect that is relatively underexplored for this type of network. I recommend acceptance.
Author Feedback
We appreciate the valuable comments on efficiency and scalability of our method (Reviewer#2), and strong experiments (Reviewer#1, Reviewer#3). Below, we clarify the issues raised by reviewers. Reviewer#1 Q1: Lack of a convincing argument for method novelty. Our method is developed to address specific challenges inherent in our LC dataset, particularly the tiny tumor regions. Concretely, the LoMamba enhances short-range dependencies within local lesion areas by using a dynamically sized local window while maintaining segmentation consistency across adjacent slices. In contrast, the SoMamba module performs feature interaction on a slice-by-slice basis, preserving the complete anatomical structure within each slice. These two modules are complementary to each other and together improve the model’s perception of tumor regions. Our FGM employs learnable frequency filters and a dynamic gate to fuse features across different scales, enhancing its ability to capture anatomical details. But [23] only utilizes wavelet transforms to separate low and high-frequency components at the input stage. Q2: Dataset release and the annotation protocol. We commit to release our LC dataset to ensure reproducibility. Our dataset is annotated by two radiologists. Then, the third radiologist conducts a review to ensure reliability. Q3: Limited evaluation metrics. We have included additional metrics and statistical tests to enrich our analysis. On the BraTS dataset, the average results on WT, TC, and ET are as follows: HybridMamba: 95.87 precision, 93.71 recall SegMamba: 95.02 precision, 92.89 recall SwinUNETR: 90.56 precision, 89.62 recall The p-value results on BraTS are: SegMamba:0.0004(Dice), 0.00062(HD95); SwinUNETR: 0.002(Dice) and 0.00084(HD95). For computational metrics—parameters (M), FLOPs (G), GPU memory (M), and inference time (s/case)—the results are: HybridMamba: 70M, 5.3G, 17,384M, 1.47s/case SegMamba: 67M, 4.6G, 16,831M, 1.66s/case SwinUNETR: 63M, 286.3G, 22,845M, 2.98s/case Q4: Small scope of datasets. We also test our method on another public dataset called LiTS2017 to compare our model (Dice:92.52; HD95:39.15) with SegMamba (Dice:91.98; HD95:43.62), verifying the generalization ability of our method. Reviewer#2 Q1: Lack of proper justification for architectural decisions. We have followed your suggestion to replace our FGM with FNOs, and the results on the LC dataset are: Dice=73.67 and HD95=68.74. We also compare our method with a wavelet-based approach, DECS-Net, whose Dice and HD95 scores on LC dataset are 70.79 and 84.75. Our method is designed to effectively capture the relationship between local tumor regions. In contrast, SwinUNETR cannot achieve global modeling due to its sliding window. UNETR fails to capture multi-scale features because of the transformer’s computational costs when processing high-resolution 3D medical images. Reviewer#3 Q1: Complicated description style for Fig.1. Thank you for your suggestion. We will revise the description of Fig. 1 to improve its clarity. Q2: Performance differences are smaller in BraTS dataset. The tumor regions in the BraTS dataset are generally larger than those in the LC dataset, which makes the segmentation task less challenging and leads to smaller performance differences. Furthermore, we include statistical tests on the BraTS dataset to ensure robustness; please refer to our response to R1Q3. Q3: Lack of comparisons with other mamba-based methods. We compare our model with UMamba (Dice: 89.97; HD95: 3.95) and Swin-UMamba (Dice: 90.66; HD95: 4.19) on BraTS2023 dataset. Q4: The details of the lung dataset and the access to code. The acquisition of our LC dataset is from multi-center. The data dimension along the z-axis ranges from 49 to 610 (median 342) with same in-plane one of 512x512. The in-plane spacing ranges from 0.613x0.613 mm to 0.947x0.947 mm (median 0.781x0.781 mm), and the z-axis spacing is from 0.8 mm to 5.0 mm (median 0.99 mm). And we commit to open our code.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
The paper proposes a HybridMamba framework for 3D medical image segmentation. The technical contribution is clearly explained in the rebuttal. The evaluation includes recent models and is thorough and convincing, but it lacks details about the dataset composition. To further improve the work, the authors could compare their method on the official BraTS 2023 external validation leaderboard, which would provide a fairer benchmark.