Abstract

In clinical practice, medical image segmentation provides useful information on the contours and dimensions of target organs or tissues, facilitating improved diagnosis, analysis, and treatment. In the past few years, convolutional neural networks (CNNs) and Transformers have dominated this area, but they still suffer from either limited receptive fields or costly long-range modeling. Mamba, a State Space Sequence Model (SSM), recently emerged as a promising paradigm for long-range dependency modeling with linear complexity. In this paper, we introduce a Large Kernel vision Mamba U-shape Network, or LKM-UNet, for medical image segmentation. A distinguishing feature of our LKM-UNet is its utilization of large Mamba kernels, excelling in locally spatial modeling compared to small kernel-based CNNs and Transformers, while maintaining superior efficiency in global modeling compared to self-attention with quadratic complexity. Additionally, we design a novel hierarchical and bidirectional Mamba block to further enhance Mamba’s global and neighborhood spatial modeling capability for vision inputs. Comprehensive experiments demonstrate the feasibility and the effectiveness of using large-size Mamba kernels to achieve large receptive fields. Codes are available at https://github.com/wjh892521292/LKM-UNet.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0286_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0286_supp.pdf

Link to the Code Repository

https://github.com/wjh892521292/LKM-UNet

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Wan_LKMUNet_MICCAI2024,
        author = { Wang, Jinhong and Chen, Jintai and Chen, Danny Z. and Wu, Jian},
        title = { { LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper contributes a novel approach to medical image segmentation by introducing LMa-UNet, which utilizes a large window-based Mamba U-shape Network to achieve efficient spatial modeling with a focus on both local and global features. Additionally, it presents a bidirectional and hierarchical State Space Sequence Model (SSM) to enhance Mamba’s capabilities, demonstrating improved performance over existing CNN and Transformer-based methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduces a new U-Net architecture variant, LMa-UNet, which incorporates a large window-based Mamba model for medical image segmentation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Lack of Comparative Analysis with Existing Mamba Structures: The paper does not provide a direct comparison with other Mamba-based models, such as SegMamba[1], on more commonly used datasets. This makes it difficult to ascertain the relative improvements or the specific advantages of the proposed LMa-UNet model over other implementations that utilize Mamba’s framework.

    Perceived Simplicity in Bidirectional Mamba Design: The bidirectional Mamba design, while novel in the context of the paper, may be seen as straightforward and not sufficiently complex or innovative to meet the expectations for top-tier conferences. The novelty of the approach might not be substantial enough to distinguish it from existing methods in a meaningful way.

    Efficiency Claims Unsupported by Data: The paper emphasizes the efficiency of the Mamba structure but does not provide empirical evidence to support these claims. Without experimental data on the model’s parameter count and inference latency, it is challenging to evaluate the true efficiency of the proposed method compared to other state-of-the-art models.

    [1]Xing Z, Ye T, Yang Y, et al. Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation[J]. arXiv preprint arXiv:2401.13560, 2024.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Enhanced Comparative Analysis:

    “It would be beneficial to include a comparative analysis with other existing Mamba-based models, such as SegMamba, to highlight the specific advantages and improvements of LMa-UNet. This will provide a clearer picture of how your model stands out within the current landscape of medical image segmentation techniques.” Depth in Methodological Novelty:

    “While the bidirectional Mamba design is an interesting addition, it may not fully satisfy the novelty criteria for top conferences. Consider elaborating on the theoretical underpinnings and the unique aspects of your approach that differentiate it from existing methods. It would also be helpful to discuss potential new applications or benefits that arise from this design.” Demonstration of Efficiency:

    “The paper could be significantly strengthened by including experimental data on the efficiency of the proposed model. Please provide details on the parameter count, computational complexity, and inference latency for LMa-UNet and compare these metrics with other state-of-the-art models. This will substantiate the claims regarding the efficiency of your model.” Broader Dataset Evaluation:

    “To ensure the robustness and generalizability of your model, it is recommended to test LMa-UNet on a wider range of datasets with varying complexities and characteristics. This will help readers understand how the model performs across different clinical scenarios and image modalities.”

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    see weakness

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    Authors propose LMa-UNet, a new Mamba-based segmentation architecture. LMa-UNet essentially is a U-Net with bi-directional Mamba module. Compared to other convolutional, transformer and Mamba-based U-Net architectures, LMa-UNet uses the pixel and patch-level SSM to enhance the local and global feature modeling. Authors validate the method on two segmentation tasks, outperforming baseline methods from literature and simpler architectures

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novelity: LMa-UNet ‘s main strength is it modifies the original mamba by using bi-directional sequence with more positional awareness.
    • Extendability: The method can be applied to already existing U-Net segmentation problems with minor changes to the model architecture.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Limited discussion on receptive fields increase: the authors claim the LM block could obtain bigger receptive fields but the schematic image could not be a good evidence. The effective receptive field need to be exhibited to endorse this statement (ref: Understanding the Effective Receptive Field in Deep Convolutional Neural Networks)
    • Limited comparison to state-of-the-art: Authors compare to U-Mamba (concurrent), which should be U-Mamba_Enc in the original paper. However, there is another model U-Mamba_Bot which only use the U-Mmaba block in the bottleneck and it has the same performance on 3D Abdomen CT (reported 86.83) as LMa-UNet. Considering the U-Mamba (concurrent) has the same DSC value as the Mamba_Enc in the original paper, the U-Mamba_Bot should have the similar performance as reported in the original paper.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Lack of clarity:

    • Better explanation of the overview figure: it would help to have a clear notation of multiplication and addition symbol on the figure.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors claims the proposed architecture has bigger reception field without view/numerical proof to endorse the statement. Also the author failed to include a SOTA model to justify the outperformance of the proposed architecture

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision
    1. The author added the comparision with UMambaBot, which still shows large increase in the dice score and decrease in the HD95
    2. The ERF images clearly show the large reception field of the new architecture.



Review #3

  • Please describe the contribution of the paper

    This paper introduces a medical image segmentation model utilizing a Mamba block. The Mamba block’s efficiency allows for large context windows, enhancing local spatial modeling. The authors propose two architectural improvements: (1) Hierarchical Mamba blocks: By stacking Mamba blocks hierarchically, the model captures both fine-grained local details and broader global context within the image. (2) Bidirectional Mamba blocks: This configuration addresses the potential “forgetting effect” in long sequences by processing information in both forward and backward directions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Well argued architectural choices and tailored design: The paper presents a well-explained model architecture based on Mamba blocks. The authors propose a hierarchical and bidirectional combination of Mamba blocks specifically designed for medical image segmentation tasks. (2) Comprehensive Evaluation: The authors conduct a comprehensive evaluation, comparing to main competitor architectures while using different imaging modalities as well as 2D and 3D data. Additionally, they employ an ablation study, systematically evaluating the impact of each proposed improvement both individually and in combination.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Limited interpretability: The paper primarily relies on quantitative metrics to evaluate the model’s performance. While these are important, including a qualitative analysis of the segmentation masks would provide valuable insights. This analysis could involve comparing the model’s outputs to those of competing methods and ground truth labels. By visualizing successful and challenging cases, the authors could reveal specific areas where their proposed architecture excels, such as capturing fine-grained details (local modeling) or handling complex structures (global modeling). This would provide a deeper understanding of how the hierarchical and bidirectional Mamba block combinations contribute to improved segmentation.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Suggestions for the future work: (1) Benchmarking efficiency: While the paper emphasizes the computational efficiency of the proposed architecture, a quantitative evaluation of its computational complexity and memory footprint would be valuable. This could involve benchmarking the model on standard datasets and comparing it to existing methods. (1) Scaling window size: The ablation study demonstrates performance improvement with larger window sizes in the PiM model. However, it would be interesting to explore the impact of even larger windows. Will the benefit continue, or will discontinuity of neighboring pixels and increased computational cost introduce a negative effect? Investigating this trade-off would provide further insights into the optimal window size selection. (3) 3D Context window optimization: The paper observes a smaller performance boost for 3D data compared to 2D data, potentially due to the larger context window size used in the PiM model. Exploring alternative strategies, such as employing n-directional processing in Mamba blocks specifically for 3D data, could be a promising direction for an improvement. (4) Statistical Analysis: Incorporating statistical significance testing and reporting standard deviations between experiments would strengthen the evaluation and provide a more robust assessment of the performance gains achieved by the proposed architecture.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a novel medical image segmentation architecture that addresses the important challenge of effectively combining global and local context within the image. The authors demonstrate the architecture’s effectiveness through a comprehensive evaluation. While some improvements would further strengthen the work, the proposed approach has the potential to be beneficial for the MICCAI community.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    While I appreciate the authors’ comprehensive response with additional experiments and visualizations, I have lowered my initial recommendation. This is due to concerns raised by other reviewers regarding the limited scope of comparison. Unfortunately, incorporating new experiments in the paper at this stage is not allowed by MICCAI guidelines.




Author Feedback

-Common Question 1: More Experiments (as requested) More experiment results comparing our LMa-UNet with SOTA including UMambaBot, UMambaEnc, and SegMamba on the Brats2023 dataset are shown below.

Conclusion: LMa-UNet consistently exceeds all SOTAs by a clear margin.

We will add these results in the final version.

—-Method—- ——WT—— —–TC—– ——ET—– Avg        
——————- Dice HD95 Dice HD95 Dice HD95 Dice HD95
-SegResNet– 92.02 4.07 89.10 4.08 83.66 3.88 88.26 4.0
—-UNETR—- 92.19 6.17 86.39 5.29 84.48 5.03 87.68 5.49
-SwinUNERT- 92.71 5.22 87.79 4.42 84.21 4.48 88.23 4.7
-SegMamba- 93.61 3.37 92.65 3.85 87.71 3.48 91.32 3.5
-UMambaBot 93.34 3.46 91.47 4.02 86.04 3.65 90.15 3.83
UMambaEnc 93.78 3.05 93.02 3.47 88.46 3.13 91.89 2.98
–LMa-UNet— 94.03 2.64 93.23 3.14 89.01 2.87 92.78 2.55

-Common Question 2: Efficiency Analysis We show the parameters and inference time of SOTAs and our LMa-UNet. LMa-Unet has less parameters and is more lightweight and faster, which facilitates clinical deployment. We will add these results to the paper.

——Method—— SegResNet SwinUNETR UMamba SegMamba LMa-UNet
—–Para. (M)—- —-18852—- —-34000—- –19654– —-17976—- 17034
Inf time (s/case) —–1.92—– —–1.68—– —-1.43—- —–1.51—– 1.20

Responses to Reviewer #1 Q1 and Q4: SegMamba and More Datasets Thanks for the suggestions! Please see “Common Question 1” above.

Q2: Mamba Design Novelty The novel design of our paper includes (1) bidirectional Mamba, (2) Patch-level SSM (PaM), and (3) Pixel-level SSM (PiM). The bidirectional Mamba design is a reasonable design for the vision task, while PaM and PiM utilize Mamba as large-window kernels to attain larger receptive fields, which provide a novel perspective on the role of Mamba in vision tasks. We argue that while our method is concurrent work of other combinations of UNet and Mamba (e.g., U-Mamba and SegMamba), our performance is the best,demonstrating our design novelty.

Q3: Efficiency Analysis Thanks for your suggestions! Please see the “Common Question 2” above.

Responses to Reviewer #3 Q1: Visualization Thank you for the suggestion! We will include visualization results, which will also demonstrate the superiority of our LMa-UNet.

Q2: Efficiency Analysis Thanks, please see “Common Question 2” above.

Q3 and Q4: Future Work Suggestion Thanks! We will explore these aspects in future work.

Responses to Reviewer #4 Q1: Effective Receptive Field We visualize the Effective Receptive Field (ERF) for CNN-based methods (SegResNet), Transformer-based methods (SwinUNETR) and Mamba-based methods (UMamba and ours LMaUNet) at an anonymous link “imagehub.cc/image/1.bgaLYe”. If the link is not allowed to include, we also calculate the ratio (%) of Active Pixels of after-training ERFs (192 x 192) under different thresholds (More “Active Pixels” mean “Larger ERF”; ↑).

The Active Pixel Count (our LMa-UNet has most Active Pixels):

–Threashold— 0.2 0.4 0.6 0.8
–SegResNet– 17.2 14.6 12.5 10.5
-SwinUNERT– 93.2 45.3 23.7 11.1
—-UMamba—- 100 99.9 10.5 <0.1
—LMa-UNet— 100 100 88.5 23.3

Conclusions: (1) CNN-based methods focus more on local feature extraction. (2) Transformer-based methods (SwinUNERT) have a wider range of effective receptive fields. (3) Concurrent Mamba-based methods (like UMamba) utilize Mamba to obtain globally effective receptive fields but weaken some local receptive fields. (4) Our proposed LMa-UNet with large window Mamba achieves larger effective receptive fields both in global and local aspects.

We will add this analysis in the final version.

Q2: Compare with UMambaBot Thanks for your suggestions! Please see the comparison results in “Common Question 1” and we will include it.

Q3: Notation on Figure Thanks for the suggestion! We will add the mentioned symbols to the figure.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper introduces LMa-UNet, a medical image segmentation model utilizing a large window-based Mamba U-shape Network. It focuses on both local and global features and incorporates a bidirectional and hierarchical State Space Sequence Model (SSM) for enhanced performance, outperforming existing CNN and Transformer-based methods. Strengths include well-argued architectural choices, comprehensive evaluation across different imaging modalities, and innovative use of Mamba blocks. However, the paper lacks direct comparison with other Mamba-based models, does not provide empirical data on efficiency claims, and has limited interpretability without qualitative analysis. It also fails to explain the effective receptive fields or compare adequately to state-of-the-art methods.

    To strengthen the paper, the authors should include more comparative analysis, empirical efficiency data, broader dataset testing, and qualitative result visualizations. Despite these weaknesses, the paper presents a promising approach for medical image segmentation.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper introduces LMa-UNet, a medical image segmentation model utilizing a large window-based Mamba U-shape Network. It focuses on both local and global features and incorporates a bidirectional and hierarchical State Space Sequence Model (SSM) for enhanced performance, outperforming existing CNN and Transformer-based methods. Strengths include well-argued architectural choices, comprehensive evaluation across different imaging modalities, and innovative use of Mamba blocks. However, the paper lacks direct comparison with other Mamba-based models, does not provide empirical data on efficiency claims, and has limited interpretability without qualitative analysis. It also fails to explain the effective receptive fields or compare adequately to state-of-the-art methods.

    To strengthen the paper, the authors should include more comparative analysis, empirical efficiency data, broader dataset testing, and qualitative result visualizations. Despite these weaknesses, the paper presents a promising approach for medical image segmentation.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal has successfully addressed most of the reviewers’ concerns, especially the ones concerning lack of comparison to recent state-of-the-art works. Overall, this paper presents interesting and inspiring ideas, and achieves promising empirical results. Regarding the extended experimental comparison results, please include those results organically in the final version.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The rebuttal has successfully addressed most of the reviewers’ concerns, especially the ones concerning lack of comparison to recent state-of-the-art works. Overall, this paper presents interesting and inspiring ideas, and achieves promising empirical results. Regarding the extended experimental comparison results, please include those results organically in the final version.



back to top