Abstract

General networks for 3D medical image segmentation have recently undergone extensive exploration. Behind the exceptional performance of these networks lies a significant demand for a large volume of pixel-level annotated data, which is time-consuming and labor-intensive. The emergence of the Segment Anything Model (SAM) has enabled this model to achieve superior performance in 2D medical image segmentation tasks via parameter- and data-efficient feature adaptation. However, the introduction of additional depth channels in 3D medical images not only prevents the sharing of 2D pre-trained features but also results in a quadratic increase in the computational cost for adapting SAM. To overcome these challenges, we present the \textbf{T}ri-\textbf{P}lane \textbf{M}amba (TP-Mamba) adapters tailored for the SAM, featuring two major innovations: 1) multi-scale 3D convolutional adapters, optimized for efficiently processing local depth-level information, 2) a tri-plane mamba module, engineered to capture long-range depth-level representation without significantly increasing computational costs. This approach achieves state-of-the-art performance in 3D CT organ segmentation tasks. Remarkably, this superior performance is maintained even with scarce training data. Specifically using only three CT training samples from the BTCV dataset, it surpasses conventional 3D segmentation networks, attaining a Dice score that is up to 12% higher.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2184_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/xmed-lab/TP-Mamba

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Wan_TriPlane_MICCAI2024,
        author = { Wang, Hualiang and Lin, Yiqun and Ding, Xinpeng and Li, Xiaomeng},
        title = { { Tri-Plane Mamba: Efficiently Adapting Segment Anything Model for 3D Medical Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces a novel tri-plane mamba adapter to SAM to capture both local and global 3D non-causal information for 3D medical image segmentation. Specifically, it first designs multi-scale 3D convolutional adapters, optimized for efficiently processing local depth-level information, and secondly, a tri-plane mamba module, engineered to capture long-range depth-level representations without significantly increasing computational costs. Experiments on the BTCV dataset demonstrate the effectiveness of the method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method, Tri-plane Mamba, adapts SAM for 3D medical image segmentation.
    2. Experiments on the BTCV dataset demonstrate the effectiveness of the method.
    3. An ablation study was presented to verify the effectiveness of the key components.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The author should opt for the currently largest model trained on 3D medical images, SAM-Med3D, not one based on SAM, which is trained with natural images.
    2. The paper failed to compare with important baselines, such as the 3DSAM-Adapter, which proposes a holistically designed scheme for transferring SAM from 2D to 3D to facilitate medical image segmentation.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Refer to Section 6.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Refer to Section 6.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a novel approach for 3D medical image segmentation using the Segment Anything Model (SAM) with Tri-Plane Mamba (TP-Mamba) adapters. The key innovations are:

    1. Multiscale 3D convolutional adapters for efficient processing of local depth-level information.
    2. Tri-plane mamba module for capturing long-range depth-level representation without significant computational cost increase.
    3. This approach achieves state-of-the-art performance in 3D CT organ segmentation tasks, even with limited training data.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well-written with a structured layout, and it demonstrates significant advantages based on the experimental results. The authors have done an excellent job in presenting their research findings and supporting their claims through rigorous experimentation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper lacks ablative experiments on SAM, such as investigating the significant differences between using and not using pretrained weights. Additionally, the impact of different sizes of SAM on the experimental results remains unexplored. Furthermore, the proposed method is only validated on a single dataset, and it is unclear whether it possesses the same advantages when applied to other datasets.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The authors should analyze why TP-Mamba exhibits such significant advantages when trained with a small amount of data. Conducting an analysis of why TP-Mamba performs exceptionally well in low-data scenarios would provide valuable insights, helping readers better understand its efficacy and offering valuable insights for further improvement and generalization.
    2. It is necessary to explore the performance of TP-Mamba on other datasets. Although the proposed method achieves impressive results on the current dataset, validating its performance on additional datasets is crucial to assess its generalization and reliability.
    3. While ablative experiments focus on validating individual modules, it is worth exploring the possibility of conducting combined validations. In addition to separately validating each module’s performance, comprehensive evaluations of the overall system by combining different modules would provide a more holistic assessment.
    4. It is necessary to conduct ablative studies on SAM.
    5. While I understand that the conference format may impose limitations on showcasing qualitative analysis results, I strongly recommend providing some qualitative analysis in the paper. In addition to quantitative analysis, qualitative analysis can offer more detailed and intuitive information, aiding readers in better understanding the strengths and limitations of the algorithm.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In conclusion, my overall recommendation for this paper is based on several key factors. Firstly, the clarity and organization of the paper, including its adherence to formatting guidelines and logical flow of ideas, were crucial in determining my evaluation. Secondly, the novelty and significance of the research findings were important in determining the value of the paper. Additionally, the methodology and analysis employed in the study, as well as the quality and relevance of the references cited, were also considered in my assessment.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors proposed a novel Tri-Plane Mamba (TP-Mamba) adapter, for adapting the Segment Anything Model (SAM) for 3D medical image segmentation. The proposed adaptor contains multi-scale 3D convolutional adapters and a tri-plane mamba module to process both local and global depth-level information efficiently.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The motivation of this paper is good, and the logit is clear. The novelty of this paper seems good, the combination of SAM and Mamba for 3D medical image segmentation is reasonable, and the results seem satisfactory. The presentation of the results is clear and easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The details in the Methodology part are unclear. Some descriptions in Implementation Details are missing. Some experimental settings need further explanation. For more comments refer to Q10.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    NO.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The authors should discuss more comprehensive existing studies about evaluating SAM on 2D medical images, e.g., [1]-[2], instead of only mentioning one Arxiv paper.
    2. The basic introduction to Mamba model should be included in the manuscript. Besides, the authors should consider introducing the LoRA approach simply in a few sentences.
    3. The authors wrote that “Features will be fed into a multi-head self-attention module where two LoRA are inserted into Q and V layers” - - but in Fig. 2(b), it seems that only one LoRA is provided. Besides, the text and figure do not match well. For example, text (layer normalization, LN) and figure (Norm). The authors should consider modify the figure or text to make it clearer.
    4. (batch, features, depth, height, width) - - what are the features? Do the authors mean feature channels?
    5. In the Dataset part, the abbreviation (BTCV) appears before the full name. The authors should check the whole paper for avoiding the similar issues.
    6. In the Data Processing part, “The intensity values of each CT scan in BTCV dataset were truncated within the range of [-200, 250] Hounsfield Units (HU).” - - (a) why choosing this range [-200, 250]? (b) No standardization/normalization of the data?
    7. The input image size is 96^3, does the author mean to directly resize the volume to this size? Will this cause serious damage to small anatomical structures?
    8. The training GPU and memory should be introduced.
    9. The link https://huggingface.co/timm/samvit base patch16.sa1b is empty (404). Is this a SAM model pretrained on medical images?
    10. In the Ablation Analysis, the authors train the model using only 25% of the training data? Why not using all the training data for ablation studies?
    11. “a 3D convolution layer with kernel size r × C × 3 × 1 × 1” and “Effectiveness of the low rank r.” Are the two r mean the same hyperparameter? I consider them to be different. The authors should check the symbols to avoid such problems.

    [1] Huang Y, Yang X, Liu L, et al. Segment anything model for medical images?[J]. Medical Image Analysis, 2024, 92: 103061. [2] Mazurowski M A, Dong H, Gu H, et al. Segment anything model for medical image analysis: an experimental study[J]. Medical Image Analysis, 2023, 89: 102918.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Good innovation and clear motivation. Unclear introduction in the method and experiment part.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have resolved most of my concerns. I believe this paper should be accepted at MICCAI.



Review #4

  • Please describe the contribution of the paper
    • The paper introduces a novel 3D segmentation model that builds upon the Segment Anything Model (SAM) framework by integrating Tri-Plane (TP-Mamba) adapters.
    • The TP-Mamba adapters are designed to capture both local and global depth-level information. This is accomplished by incorporating parallel branches of multi-scale 3D convolution layers with varying dilation rates, followed by a Tri-plane Mamba module.
    • Experimental evaluations conducted on the Beyond the Cranial Vault (BTCV) dataset, with the objective of semantic segmentation of 13 abdominal organs, demonstrate the superior performance of the proposed approach compared to state-of-the-art (SOTA) methods even with limited amount of training samples.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors have successfully established a new state-of-the-art (SOTA) methodology for semantic segmentation in medical images, even in scenarios with limited training data availability.
    • Thorough evaluation against eight additional state-of-the-art methods, the model’s performance in segmenting 13 distinct organs demonstrate the superiority of the proposed approach.
    • Comprehensive experiments conducted as part of the ablation studies, encompassing different aspects of the proposed approach such as the efficacy of the 3D convolutional module, Mamba scanning strategy, and low-rank values, serve to further validate the contributions of the approach.
    • The paper is well written and structured, which enhances the understanding of the presented research outcomes.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Despite the comprehensive evaluation of the different aspects of the approach and the inclusion of a multi-organ segmentation dataset, considering additional datasets could have provided additional insights into the model’s generalization capabilities for semantic segmentation across diverse acquisition protocols, medical imaging techniques (e.g. MRI), and datasets. For example, inclusion of datasets from challenges such as BRATS or ACDC could have further enriched the study.
    • While the authors claimed reduced computational costs associated with the proposed approach, direct validation of this claim through experimental evidence was not shown.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?
    • The inclusion of the source code as part of the supplementary material, together with the authors’ intent to release the training pipeline, enhances the paper’s reproducibility. Considering the level of details provided in the paper and the use of a publicly available dataset to validate the experiments, the reproducibility of the study appears feasible.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • To improve the validation of the proposed approach’s generalization across diverse datasets, conducting experiments on additional datasets such as BRATS and ACDC would have been beneficial. These datasets not only correspond to additional organs, but also represent another modality of medical imaging beyond CTs, i.e. MRIs, along with their corresponding acquisition protocols.
    • As part of the evaluation analysis, it would have been beneficial to further present results focused on the amount of model parameters or Floating Point Operations per Second (FLOPs) to gain deeper insights into the computational efficiency and scalability of both the model and baselines.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Methodology: the authors have successfully established a new state-of-the-art (SOTA) methodology for semantic segmentation in medical images, even in settings with limited amount of training dat (e.g. 12% of the training data).
    • Paper Presentation and Reproducibility: the paper is well written and structured. The inclusion of the source code as part of the supplementary material, along with the authors’ intention to release the training pipeline, supports the paper’s reproducibility.
    • Potential for Improvement: despite the comprehensive evaluation, considering additional datasets involving alternative medical imaging modalities (e.g. MRIs) could have provided further insights into the model’s generalization capabilities. Additional validation of reduced computational costs through FLOPs or model parameters analysis would have been beneficial.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We sincerely thank the reviewers for their valuable comments. We are pleased that the reviewers praised the novelty, comprehensive experiments, excellent performance, and good writing of our paper. Here, we provide detailed explanations to address the raised concerns.

[More Dataset, #R1, #R4] As the rebuttal guidelines suggest, we cannot present additional experiments here. We believe that our approach can still achieve SoTA on MRI datasets as we observed that the performance order of compared methods remains consistent in MRI datasets [1, 20]. If permitted, we will consider including them in the revision (or supplementary).

[GFLOPs and Parameters, #R1, #R4] As shown in Fig. 1(c) in the main paper, the GFLOPs for each ViT block is approximately 18G, while the multi-scale Convs consume 0.38G and Mamba uses 2.37G, resulting in only about a 15% increase in total GFLOPs of ViT. Our adapters have about 5.2M learnable parameters, which is merely 6% (5.2M out of 86M) of the frozen parameters of the ViT. In contrast, the 3D-UX-Net[13] has 53M learnable parameters. These lightweight adapters contribute to both effective and efficient convergence.

[Compare with SAM-Med3D and 3DSAM-Adapter, #R3] SAM-Med3D is a foundational model and can be an alternative to SAM in our method. As we have done sufficient experiments to demonstrate the effectiveness of our TP-Mamda (a universal plug-and-play adapter), we believe it can work well with SAM-Med3D. For comparison with 3DSAM-Adapter, we still believe our method can perform better as it shares a similar design with MA-SAM [1] (see Table 1). As the rebuttal guidelines suggest, we cannot present additional experiments here and will consider including the comparison in the revision (or supplementary) if permitted. More importantly, in the initial manuscript, we have compared 10 baselines (including different networks and adapters), sufficiently and comprehensively showing the effectiveness and superior performance of our method.

[More Analysis on TP-mamba, #R4] SAM is trained on billions of natural data and excels at distinguishing between foreground and background. Consequently, our lightweight adapters based on SAM are sufficient to transfer SAM’s ability to medical domain quickly and efficiently (see Fig 4).

[Ablations and Qualitative Results, #R4] We will include experiments on without pre-trained SAM, ViT-L-based SAM and SAM-Med3D (#R3) in the supplementary. Additionally, we will conduct grid-search ablation studies on the Mamba and Convs layers, as well as show qualitative visualizations.

[Discussion on SAM-2D and Mamba, #R5] Thank you for your valuable comments. We will include surveys on SAM-2D in the ‘Introduction’ section and provide detailed explanations of the Mamba algorithm in the ‘Methodology’.

[More details on methods and experiments, #R5] Here we clarify some unclear details and will update them in the main paper.

Q3. All normalization layers used in this paper are LN. LoRA modules are applied to both Q and V layers. Q4. ‘Features’ refers to feature dimensions. Q6. Following MA-SAM[1], we set the HU window to [-200, 250] for observing abdominal organs. The HU values are then normalized to the range of [0, 1] using min-max normalization. Q7. CT volumes are randomly cropped into 96x96x96 sub-volumes during training. Sliding window inference will be applied during testing, which does not down-sample the CT images. Q8. NVIDIA RTX 3090 GPU is used, and an automatic mixed precision strategy is employed to accelerate processing and reduce GPU memory requirements. Q9. The link of SAM is https://huggingface.co/timm/samvit_base_patch16.sa1b. It is pre-trained on natural images. Q10. As shown in Fig. 4, the dice scores across the three settings exhibit a similar trend. We select 25% based on the trade-off between training time and available data. Q11. ‘r’ indicates the same item, the low rank.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    After the rebuttal, most reviewers have provided consistently positive feedback on this paper. Additionally, the authors have presented good arguments in response to the additional experiments requested by reviewer 3. Therefore, I recommend accepting this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    After the rebuttal, most reviewers have provided consistently positive feedback on this paper. Additionally, the authors have presented good arguments in response to the additional experiments requested by reviewer 3. Therefore, I recommend accepting this paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The novelty of the method and evaluation results in the experiments are good enough.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The novelty of the method and evaluation results in the experiments are good enough.



back to top