Abstract

Cryogenic Electron Tomography (CryoET) is a useful imaging technology in structural biology that is hindered by its need for manual annotations, especially in particle picking. Recent works have endeavored to remedy this issue with few-shot learning or contrastive learning techniques. However, supervised training is still inevitable for them. We instead choose to leverage the power of existing 2D foundation models and present a novel, training-free framework, CryoSAM. In addition to prompt-based single-particle instance segmentation, our approach can automatically search for similar features, facilitating full tomogram semantic segmentation with only one prompt. CryoSAM is composed of two major parts: 1) a prompt-based 3D segmentation system that uses prompts to complete single-particle instance segmentation recursively with Cross-Plane Self-Prompting, and 2) a Hierarchical Feature Matching mechanism that efficiently matches relevant features with extracted tomogram features. They collaborate to enable the segmentation of all particles of one category with just one particle-specific prompt. Our experiments show that CryoSAM outperforms existing works by a significant margin and requires even fewer annotations in particle picking. Further visualizations demonstrate its ability when dealing with full tomogram segmentation for various subcellular structures. Our code is available at: https://github.com/xulabs/aitom

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0532_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0532_supp.pdf

Link to the Code Repository

https://github.com/xulabs/aitom

Link to the Dataset(s)

https://www.ebi.ac.uk/empiar/EMPIAR-10499/

BibTex

@InProceedings{Zha_CryoSAM_MICCAI2024,
        author = { Zhao, Yizhou and Bian, Hengwei and Mu, Michael and Uddin, Mostofa R. and Li, Zhenyang and Li, Xiang and Wang, Tianyang and Xu, Min},
        title = { { CryoSAM: Training-free CryoET Tomogram Segmentation with Foundation Models } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    Authors propose CryoSAM to segment ribosomes from M. pneumoniae cells 3D cryoET images, which uses DINO to extract features and SAM for segmentation. Segmentation is conducted on multiple plans of the 3D image in a slice-by-slice way and features from DINO are used to segment other objects. The method is validated on one public ribosomes dataset, outperforming existing particle picking methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Training free: the proposed method uses pre-trained models (DINO and SAM) from nature image domain and doesn’t rely on training data.
    2. Simple interaction: The proposed method only requires one prompt to segment multiple objects.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Overstated title: the title should be specific since this work only segments ribosomes from M. pneumoniae cells.
    2. Error propagation across slices: In the prompt-based 3D segmentation, the new prompts are generated by the segmentation mask from the previous slice. However, it is not clear how to determine the end-slice of the ribosomes. If the segmentation mask of the previous slice is wrong, this error will also propagate to new slices.
    3. Lack of discussion on efficiency: the method news to compute multiple feature maps from different plans, this is in-efficient compared to other methods (e.g., crYOLO).
    4. Limited generalization ability: the proposed method cannot generalize to other objects, such as membranes.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors used public datasets for benchmarking but the proposed pipeline contains multiple complex steps. The reproducibility can be enhanced if the author could release the code in anonymous repository.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Revise the title to reflect the work that has done
    2. Explain the solution to address the error propagation across slices
    3. Discuss the efficiency and compare with the benchmarked methods
    4. Evaluate the methods on other objects. Here is a labeled cryo-ET dataset for method validation. https://www.nature.com/articles/s41592-022-01746-2
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed is not efficient because of the computation burden of image features and the validation is only conducted on one object (ribsomes) in one cell type.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors present a novel framework to perform particle segmentation in CryoET tomograms. The main idea is to exploit a powerful 2D segmenter (namely the SAM model) and adapt it to the processing of 3D tomograms without retraining. Starting from a single prompt in a 2D slice, the segmentation result is propagated along the three dimensions by considering consecutive slices in which the particle is present, and using recursively the segmentation of a slice to detect the same particle in the following/preceding slices. This first 3D segmented particle is then used to extract prompts for all other similar particle by means of a hierarchical feature matching strategy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed methodology, although relying in some parts on existing approaches, combines them in a novel manner. The results are very promising, both in terms of segmentation results and in terms of execution times. Furthermore, the paper is very well written, and the figures are very beautiful and useful to guide the reader along all the steps.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper lacks some discussion on the limitations of the method: do the authors expect it to work well also when there are several different particles to detect? might it be used for larger objects? and on other datasets? What are the future directions?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    n/a

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    line 112, par 2.2. “We rely on an off-the-shelf image encoder E to extract 2D features”. The decoder is then defined later at line 158: it would be nice to call it with the same name E (Epsilon) it has in line 112 and in figure 3 to immediately associate it with the model it refers to.

    figure 1 and section 2.1 and 2.2. In the figure the steps 1 and 2 are Feature Extraction and Prompt-based 3D seg. However in the method description the step two is discussed before the step one (section 2.2 and 2.1 respectively). I find it a little bit counterintuitive.

    line 160: how are the threshold tIoU and tausim set? Please specify.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed methodology is original, and might be of interest for the scientific community.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors’ responses are compelling and address the concerns raised satisfactorily. Therefore, I update my position and recommend an accept for this submission.



Review #3

  • Please describe the contribution of the paper

    This manuscript presents a novel training-free method for segmenting particles from CryoET tomograms. The other two contributions of this submissions are the two core parts of this method, namely: cross-plane self-prompting and hierarchical feature matching.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    • Technically-sound training-free method, well presented • Superior results, ablation studies performed

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    • Performed validation needs to be better explained • The figures are too small and rather difficult to read

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The performed validation needs to be clarified.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Section 3.1, lines 152–154: I find the phrase about the “prediction from each proposed prompt” very confusing. If this is the case, it would mean that only the first step of the proposed model, namely the prompt-based 3D segmentation, was validated. As the subsequent feature matching step relies on multiple prompts. This part needs to be rewritten.

    2. The figures are too small and difficult to read.

    3. In Section 2.1, lines 108–110: the authors present their self-prompting approach as a method that performs search in 6 directions. Although, in reality, the method works along one direction at a time, after which the three views are aggregated. The authors might want to rewrite this part somewhat to make this fact clearer.

    4. Section 3.2, line 169: the authors mention “significant advancements” demonstrated by their approach. However, even though their results clearly outperform that by other methods, the significance was not strictly assessed.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I find this article to be a very strong (conference) submission, with very limited points for improvement.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank R1, R3, and R4 for the insightful feedback. All comments and the suggested reference by R4 will be integrated into the final draft.

Reviewer #1

  1. Lines 152–154. “Prediction from each proposed prompt” refers to the final step of our model, using the prompt-based 3D segmentation given one proposed prompt from Hierarchical Feature Matching. We differentiate proposed prompts (intermediate results) from input prompts (user inputs). Proposed prompts help identify all particles of a category as the original input prompts might not cover them all.
  2. Sizes of figures. We will enlarge the figures to improve readability.
  3. Lines 108–110. Your understanding is correct. The self-prompting process is sequential for different directions. We will clarify this for better understanding.
  4. Result significance. Our current results are based on an average of 5 random seeds as the input prompt can affect the performance. We conducted one-tailed paired t-tests to assess the significance of our improvements over [10], consistently yielding p-values below 0.01. This will be clarified in our final draft.

Reviewer #3

  1. Limitation discussion. Fig. 2 of our Supp. Mat. shows our method can be used for membrane segmentation, which signifies its potential in segmenting objects beyond particles and distinguishing between unique objects. Hence, we expect our method to work with different particles and larger objects. One limitation is the need for a properly denoised input, a common requirement for CryoET particle picking. To this end, one future direction can be developing learning-based adaptors that can adjust the feature extractor and segmenter to handle noisy inputs.
  2. Reference to the image encoder $\mathcal{E}$. We will refer to the image encoder as $\mathcal{E}$ when mentioned in our revised draft.
  3. Order of modules. We will reorder them to match the framework overview in Fig. 1.
  4. Choices of $\tau_\text{IoU}$ and $\tau_\text{sim}$. In line 162, we set both to 0.5, which generally works well. On other tomograms with different PSNRs, these values might need adjustment for optimal results.

Reviewer #4

  1. Overstated title. We agree a more specific title can better reflect our scope. Since our work mainly focuses on particle picking, we will revise it to “Training-free CryoET Tomogram Segmentation for Particle Picking”.
  2. Error propagation across slices. We use a threshold $\tau_\text{IoU}$ to determine the end slice of particles, as mentioned in lines 105-108 and 160-162. When a plane contains a slice of a particle but the next plane does not, the segmentation mask changes significantly. We calculate the IoU between segmentation masks from adjacent planes and compare it with $\tau_\text{IoU}$ to end the self-prompting process. In this way, error propagation is also not likely unless the first segmentation is wrong, which can be avoided by providing filtered prompts.
  3. Efficiency discussion. As shown in Tab. 1, our model outperforms crYOLO in both performance and runtime. We note that CryoET particle picking is in 3D, while crYOLO is actually a 2D detector that also extracts features from each plane, sharing similar computational complexity with us in inference. Unlike our training-free algorithm, crYOLO needs fine-tuning, thereby leading to relatively worse efficiency.
  4. Generalization ability. We agree that evaluating our method on more cell types and objects will enhance its applicability. Hence, we show qualitative results of membrane segmentation in Fig. 2 of our Supp. Mat., indicating the potential of our method to generalize across different objects.
  5. Evaluation on other objects. Thank you for suggesting additional benchmarks. Our original intent was to follow the evaluation protocol set by [10] for particle picking. While we cannot add new experiments due to conference guidelines, we will extend our work later to further validate its generalization ability.
  6. Reproducibility. We will release our code upon acceptance.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Based on the detailed and thoughtful rebuttal provided by the authors, I recommend the acceptance of the manuscript. The authors have addressed all reviewer concerns comprehensively, demonstrating their commitment to improving the clarity and robustness of their work.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Based on the detailed and thoughtful rebuttal provided by the authors, I recommend the acceptance of the manuscript. The authors have addressed all reviewer concerns comprehensively, demonstrating their commitment to improving the clarity and robustness of their work.



back to top