Abstract

Electron microscopy (EM) imaging offers unparalleled resolution for analyzing neural tissues, crucial for uncovering the intricacies of synaptic connections and neural processes fundamental to understanding behavioral mechanisms. Recently, the foundation models have demonstrated impressive performance across numerous natural and medical image segmentation tasks. However, applying these foundation models to EM segmentation faces significant challenges due to domain disparities. This paper presents ShapeMamba-EM, a specialized fine-tuning method for 3D EM segmentation, which employs adapters for long-range dependency modeling and an encoder for local shape description within the original foundation model. This approach effectively addresses the unique volumetric and morphological complexities of EM data. Tested over a wide range of EM images, covering five segmentation tasks and 10 datasets, ShapeMamba-EM outperforms existing methods, establishing a new standard in EM image segmentation and enhancing the understanding of neural tissue architecture.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0151_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0151_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Shi_ShapeMambaEM_MICCAI2024,
        author = { Shi, Ruohua and Pang, Qiufan and Ma, Lei and Duan, Lingyu and Huang, Tiejun and Jiang, Tingting},
        title = { { ShapeMamba-EM: Fine-Tuning Foundation Model with Local Shape Descriptors and Mamba Blocks for 3D EM Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper contributes a novel fine-tuning method named ShapeMamba-EM for 3D electron microscopy (EM) image segmentation. The method addresses the unique challenges of EM data, such as high resolution and complex tissue structures, by integrating two key innovations into a pre-existing 3D medical foundation model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper shows that ShapeMamba-EM outperforms existing methods in EM image segmentation, establishing a new standard in the field. This performance improvement is a strong aspect of the work, indicating the potential for ShapeMamba-EM to become a leading method in the area.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper presents a novel approach to 3D EM image segmentation with the ShapeMamba-EM method, which incorporates local shape descriptors and Mamba blocks for fine-tuning a foundation model. Despite its innovative aspects, there are several weaknesses that could be considered:

    Lack of Benchmark Comparisons: The paper does not provide comparisons with state-of-the-art methods on public benchmarks, which is crucial for validating the effectiveness of the proposed method. Specifically, the method has not been tested on the challenging task of neuron instance segmentation in EM images, which is a key area in the field. A comparison with established methods like the one proposed by Sheridan et al. [1] would be necessary to demonstrate the superiority of ShapeMamba-EM.

    Limited Evaluation on Mitochondria Segmentation: The paper does not include a comparison with cutting-edge methods on the task of EM mitochondria segmentation. Including a comparison with methods such as the one presented by Li et al. [2] and calculating additional metrics like mAP, AP50, and AP75 would provide a more comprehensive evaluation of the method’s performance.

    Simplicity of the Proposed Method: While the use of LORA for fine-tuning 3D networks is a strategic choice, the method may be perceived as relatively simple, especially when compared to more complex architectures that may be capable of capturing more nuanced features of EM images. The reliance on SSM (State Space Model) blocks for parameter efficiency, although innovative, might not fully leverage the potential of more advanced network designs.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See Weakness

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See Weakness

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper presents ShapeMamba-EM, a fine-tuning method tailored for 3D EM image segmentation. It tackles the specific challenges of EM data by incorporating recent techniques like FacT for efficient fine-tuning and Mamba layers for capturing long-range dependencies. This approach effectively addresses the volumetric and morphological complexities of EM data, leading to promising experimental results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A novel fine-tuning method specifically designed for 3D EM image segmentation. This novel formulation allows for effective adaptation of foundation models to the unique challenges of EM data, enhancing segmentation accuracy and efficiency.

    2. Integration of Local Shape Descriptors into the segmentation process, providing a novel way to capture the morphological characteristics of segmentation objects in EM images. By incorporating LSDs, the method improves boundary prediction and enhances the model’s ability to accurately segment objects with similar local features, contributing to the robustness of the segmentation results.

    3. The paper demonstrates the strength of ShapeMamba-EM through a comprehensive evaluation on five EM image segmentation tasks across ten datasets and effectively addresses the unique volumetric and morphological complexities of EM data, such as high resolution, noise, and distributed objects.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper combines various modules, such as FacT for fine-tuning and Mamba layers for capturing long-range dependencies, which have been recently proposed in the literature. However, the paper does not provide sufficient justification for the combination of these modules or demonstrate how their integration contributes to the novelty of the proposed method. A clearer articulation of the novel aspects arising from the combination of these modules would strengthen the paper’s contribution.

    2. While the method is compared against some state-of-the-art (SOTA) approaches, the comparison is limited, and the evaluation does not cover a comprehensive range of SOTA methods.

    3. Insufficient Analysis of Module Effects: The paper lacks detailed analysis of the individual effects of different modules, such as the LSD module, on the segmentation performance. Specifically, it does not provide a thorough examination of how the LSD module enhances morphological clue representation or contributes to improved segmentation accuracy.
    4. Despite introducing the LSD module to capture local shape descriptors, the paper does not include feature visualization to demonstrate how LSD enhances morphological clue representation. Visualizing feature maps or representations learned by the LSD module would provide qualitative insights into its effectiveness in capturing fine-grained morphological details and improving segmentation accuracy; similar is needed for the MAMBA module.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Justify Module Combination: Explain why the chosen modules were integrated and how their combination addresses specific EM segmentation challenges.

    2.. Please Include more recent approaches to provide a comprehensive evaluation of the proposed method’s performance.

    1. Conduct experiments to quantify the impact of each module, particularly the LSD module, on segmentation accuracy.

    2. Use feature visualization to demonstrate how the LSD module improves morphological clue representation and segmentation accuracy.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    the paper lacks a clear justification for the combination of modules like FacT and Mamba layers, leaving doubts about the novelty and effectiveness of the proposed method. Additionally, the comparison with state-of-the-art approaches is limited, failing to provide a comprehensive evaluation against a broader range of methods in the field. This narrow scope hinders the assessment of the proposed method’s relative performance and contribution. Furthermore, there is insufficient analysis of the individual effects of different modules, such as the LSD module, on segmentation performance, which impacts understanding of the method’s efficacy.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper introduces an innovative model that enhances the SAM-Med3D core framework for 3D electron microscopy (EM) segmentation tasks. Specifically, it employs adapters for long-range dependency modeling and an encoder for local shape description within the original foundational model. Experiments on five segmentation tasks across 10 datasets demonstrate the effectiveness of the method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The method proposed, ShapeMamba-EM, is a specialized fine-tuning method for 3D EM segmentation.
    2. Experiments on five segmentation tasks and 10 datasets demonstrate the effectiveness of the method.
    3. An ablation study was presented to verify the effectiveness of key components.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Since the paper includes a 3D Mamba Adapter, the method should also be compared with other Mamba models, such as U-Mamba, VM-UNet, and nnMamba.
    2. The author should explain the choice of the 3D Mamba Adapter for capturing long-range dependencies. Why not use a multi-head self-attention adapter instead?
    3. The author should justify the use of the 3D U-Net network to predict the Local Shape Descriptors. Why not consider a more advanced model?
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Refer to Section 6.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Refer to Section 6.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

To #R1:

  • For Q1&Q2: It is a misunderstanding, Tab.1 (Task 2) and Fig. 4 (Line.2) are the results of neuron instance segmentation, which is called cell segmentation in our paper. Due to unclear references [1][2], it is hard to address this point effectively. However, we provide a comparison with SOTA methods for each dataset, please see the response to R4Q2.
  • For Q3: Thanks for the suggestion, in future work, we will explore more advanced networks.

To #R2:

  • For Q1: Our paper focuses on fine-tuning foundation models for EM images. Due to space limitations, we did not include comparisons with these non-peer-reviewed and unpublished arXiv papers. Additionally, the mentioned models are all 2D models and do not support 3D segmentation. We appreciate the suggestion and will explore them in future work.
  • For Q2: We have compared our method with those using multi-head self-attention, including MSAM3D and 3DSAMA, and the experiments confirm Mamba’s superior performance.
  • For Q3: We chose the 3D U-Net because the original LSD method uses it and achieves accurate LSD estimation. We will explore other advanced models in future work.

To #R4:

  • For Q1: The reason we choose FacT is to allow efficient parameter tuning on small datasets and enhance model performance with minimal adjustments, as data is scarce and costly to annotate. Mamba was chosen to capture long-range dependencies in EM images, which is essential for identifying complex cellular structures. Experiments show our method outperforms existing approaches on standard EM datasets, proving its practical benefits.
  • For Q2: Thanks for the suggestions. Following them, we researched the corresponding SOTA methods for all datasets and try to choose one SOTA method for each dataset for comparison. Because no new experiments are allowed, we choose to report the performance in the original paper of the SOTA method, which is evaluated with the same metrics (Dice and mAP) as ours. Among the ten datasets, EM-R50 and Gauy have no corresponding SOTA methods which reported performance with these two metrics and therefore no SOTA methods are chosen for these two datasets. For each of the rest eight datasets, there is one SOTA method chosen and its performance from the original paper is directly reported without new experiments conducted. Please note that this comparison is fair because the experimental setting is same for our method the newly chosen SOTA method. . The comparison results are formatted as: “Dataset: SOTA method (ref) Dice scores (SOTA/ours) | mAP scores (SOTA/ours)”. “-” indicates no scores for that metric. ISBI2012: PS-Net(Shi 2023) 0.940/0.958 | -/0.951 SNEMI3D: DMT(Hu 2021) 0.971/0.964 | -/0.961 CREMI(Synapse): CleftNet(Liu 2021) 0.831/0.834 | -/0.865 Kasthuri++: HIVE-Net(Yuan 2021)0.962/0.968 | -/0.936 Lucchi++: DualRel(Mai 2023) 0.934/ 0.940 | -/0.954 MitoEM-H: ATFormer(Pan 2023) -/0.847 | 0.782/0.877 MitoEM-R: ATFormer(Pan 2023) -/0.852 | 0.682/0.930 NucMM-Z: U3D-BCD (lin2021) 0.879/0.915 | 0.894/0.907 Compared to the chosen SOTA methods, our method surpasses them on all datasets except for a 0.007 Dice score gap on SNEMI3D. This demonstrates our method’s generalization capabilities and SOTA performance. Additionally, the original submission did not report these results because its focus was on proposing a finetuning method for various EM image segmentation tasks. Therefore, we only compared our method with finetuning methods. We will add these SOTA results in the final version.
  • For Q3: Actually, we have provided the analysis of the individual effects of different modules, including LSD and Mamba. Please see Tab.1&2 (line “w/o L” and “w/o M”).
  • For Q4: The feature visualization of LSD module is provided in Fig. 3. And we will add the visualization of Mamba module in the supplementary.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors addressed the reviewers’ concerns well. The paper still have some holes as reviewers pointed out, but overall the paper has great merits to be accepted as the first attempt (to my knowledge) to adapt foundation models to the compiled EM datasets over multiple tasks.

    • Novelty: New combination of existing CV components into an effective pipeline for biomedical image analysis is novel by itself.

    • Experiment results: I agree with reviewers that it’ll be better to use the original metric for each benchmark (e.g., ARAND for SNEMI3D, VOI for CREMI) and include more SOTA baselines. However, the paper in its current form (with the results in the rebuttal) already has one of the most comprehensive comparisons in the EM field. It’ll be great for authors to make the recommended changes.

    • Ablation studies and visualization: The paper has already the main ablation studies, which is above the bar for publication in MICCAI. It’ll be great for authors to dig deeper into the visual explanation of the learned model, which is not mandatory to get the paper accepted.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors addressed the reviewers’ concerns well. The paper still have some holes as reviewers pointed out, but overall the paper has great merits to be accepted as the first attempt (to my knowledge) to adapt foundation models to the compiled EM datasets over multiple tasks.

    • Novelty: New combination of existing CV components into an effective pipeline for biomedical image analysis is novel by itself.

    • Experiment results: I agree with reviewers that it’ll be better to use the original metric for each benchmark (e.g., ARAND for SNEMI3D, VOI for CREMI) and include more SOTA baselines. However, the paper in its current form (with the results in the rebuttal) already has one of the most comprehensive comparisons in the EM field. It’ll be great for authors to make the recommended changes.

    • Ablation studies and visualization: The paper has already the main ablation studies, which is above the bar for publication in MICCAI. It’ll be great for authors to dig deeper into the visual explanation of the learned model, which is not mandatory to get the paper accepted.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The manuscript introduces ShapeMamba-EM, a fine-tuning method for 3D Electron Microscopy (EM) image segmentation. This method leverages foundation models initially designed for medical images and adapts them to the unique challenges of EM data, which includes high resolution and complex tissue structures. The proposed approach incorporates Local Shape Descriptors (LSDs) and 3D Mamba Adapters to effectively model long-range dependencies and enhance segmentation accuracy. Extensive experiments demonstrate superior performance across multiple segmentation tasks and datasets.

    Strengths of the manuscript:

    • ShapeMamba-EM introduces a novel fine-tuning approach that effectively adapts foundation models to the specific challenges of 3D EM data. This includes the integration of LSDs and 3D Mamba Adapters for improved segmentation accuracy.
    • The manuscript includes extensive experiments across multiple datasets and segmentation tasks, providing strong evidence of the superiority of the methods over existing approaches.

    Weaknesses and concerns:

    • The method involves a combination of multiple components, which may add complexity to the implementation. However, the authors have provided sufficient details and justifications for each component, demonstrating their necessity.
    • More detailed visual explanations of the learned features and model behavior could further strengthen the manuscript. The authors have addressed this concern by providing additional visualizations and planning to include more in the final version.
  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The manuscript introduces ShapeMamba-EM, a fine-tuning method for 3D Electron Microscopy (EM) image segmentation. This method leverages foundation models initially designed for medical images and adapts them to the unique challenges of EM data, which includes high resolution and complex tissue structures. The proposed approach incorporates Local Shape Descriptors (LSDs) and 3D Mamba Adapters to effectively model long-range dependencies and enhance segmentation accuracy. Extensive experiments demonstrate superior performance across multiple segmentation tasks and datasets.

    Strengths of the manuscript:

    • ShapeMamba-EM introduces a novel fine-tuning approach that effectively adapts foundation models to the specific challenges of 3D EM data. This includes the integration of LSDs and 3D Mamba Adapters for improved segmentation accuracy.
    • The manuscript includes extensive experiments across multiple datasets and segmentation tasks, providing strong evidence of the superiority of the methods over existing approaches.

    Weaknesses and concerns:

    • The method involves a combination of multiple components, which may add complexity to the implementation. However, the authors have provided sufficient details and justifications for each component, demonstrating their necessity.
    • More detailed visual explanations of the learned features and model behavior could further strengthen the manuscript. The authors have addressed this concern by providing additional visualizations and planning to include more in the final version.



back to top