Abstract

Tooth point cloud segmentation is a fundamental task in many orthodontic applications. Current research mainly focuses on fully supervised learning which demands expensive and tedious manual point-wise annotation. Although recent weakly-supervised alternatives are proposed to use weak labels for 3D segmentation and achieve promising results, they tend to fail when the labels are extremely sparse. Inspired by the powerful promptable segmentation capability of the Segment Anything Model (SAM), we propose a framework named SAMTooth that leverages such capacity to complement the extremely sparse supervision. To automatically generate appropriate point prompts for SAM, we propose a novel Confidence-aware Prompt Generation strategy, where coarse category predictions are aggregated with confidence-aware filtering. Furthermore, to fully exploit the structural and shape clues in SAM’s outputs for assisting the 3D feature learning, we advance a Mask-guided Representation Learning that re-projects the generated tooth masks of SAM into 3D space and constrains these points of different teeth to possess distinguished representations. To demonstrate the effectiveness of the framework, we conduct experiments on the public dataset and surprisingly find with only 0.1\% annotations (one point per tooth), our method can surpass recent weakly supervised methods by a large margin, and the performance is even comparable to the recent fully-supervised methods, showcasing the significant potential of applying SAM to 3D perception tasks with sparse labels. Code is available at https://github.com/CUHK-AIM-Group/SAMTooth.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0843_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/CUHK-AIM-Group/SAMTooth

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Liu_When_MICCAI2024,
        author = { Liu, Yifan and Li, Wuyang and Wang, Cheng and Chen, Hui and Yuan, Yixuan},
        title = { { When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a SAM-based framework to segment tooth point cloud with extremely sparse supervision. The Confidence-aware Prompt Generation and Mask-guided Representation Learning strategies are two main contributions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper leverages the Segment Anything Model to solve the problem of extremely sparse supervision in tooth point cloud segmentation.
    2. The Confidence-aware Prompt Generation module is proposed to generate better point prompts, using the confidence to determin the candidate points.
    3. A Mask-guided Representation Learning strategy is used to brige the domain gap between 2D image and 3D point cloud.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1)The paper is novelty which seems an incremental application of SAM model in tooth segmentation. 2) In the ablation experiments, what is the baseline model? is this the vallina SAM model? 3) Design choices are not fully clear, e.g. except the prompt decoder how to deal with the image encoder and decoder?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    I am interesting in the 2D-3D projection and re-projection codes which is the main obstacle for point cloud segmentation via 2D methods.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) more comparsons with sota methods which regards both on semi-supervise learning and 3d tooth segmentation are welcome. 2) more experiments on different datasets beyond the tooth maybe convincing.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes to segment 3D tooth, which is the first attempt to segment teeth with the SAM.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors leveraged 2D SAM for 3D tooth point cloud segmentation. First the points are segmented with confidence score, then the confident predictions are aggregated as prompt to SAM, together with mesh-rendered 2D images. The 2D prediction is then projected back to point clouds. The method reached comparable performance as supervised methods with 1 point per teeth, and surpassed compared weakly supervised methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The description of the method is clear.

    The results are strong as the proposed method surpassed other weakly supervised method with >10% Dice.

    The ablation studies correctly demonstrated the usage of the proposed components.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Since per teeth there’s only one point as label during training, it would be important to understand if the selection of the point is important. Currently the paper does not mention how the point is selected, and it does not discuss the variance of the performance.

    It would be nice to have an analysis to understand the model’s performance with different amount of data, would this method eventually surpass other supervised method with more data?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The proposed method is a relatively complicated system, releasing code base would be important for future studies.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    As discussed in weakness, more details of the training data selection, analyses on model performance variance, and an analysis on data availabilities would make the paper stronger.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    If the paper result is very strong. Overall the analysis and writing is good.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors have proposed a novel method for tooth point cloud segmentation with sparse labels that leverages the recent advances of 2D segment anything model. The proposed method carefully integrates the 2D SAM into the 3D segmentation task in a monolithic framework that can be trained end-to-end. The resulting method outperforms existing methods by a large margin approaching the accuracy of fully supervised methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of the proposed methodology lies in the proper and novel integration of the 2D SAM in the 3D segmentation task by i) utilizing 2D SAM predictions to guide 3D segmentation training and ii) generating confidence aware prompts from the 3D data to create high-quality 2D masks from SAM. The resulting framework can be trained end-to-end only requiring a set of very sparse labels and achieves state of the art results compared to existing works representing a significant advance. The manuscript is well written and the methods are properly described. The experiments support the claims of the authors. An ablation study provides additional insights into model performance and supports the design choices proposed.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The manuscript does not contain major weaknesses in my opinion. The ablation study could be expanded to include more elements of the architecture, such as the segmentation backbone.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors rely on a publicly available dataset for training and evaluation of their method. Although it is not mentioned that the code will be released, the methods and implementation are clearly described.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    As mentioned above, I do not see significant weaknesses in the proposed approach. To further the impact of the methodology, authors should expand their ablation study to show independence of their design choices from specific instantiations of e.g. the segmentation backbone. In addition, as the proposed method has potential application and impact in other areas of MIC, I would like to encourage the authors to apply and evaluate other use cases. This would further increase the significance of the contribution.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors have proposed a novel concept to leverage 2D SAM for a complex 3D segmentation task with very sparse labels. The results are a significant improvement over state of the art and have impact beyond the application considered in the manuscript.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We sincerely appreciate the reviewers for their meticulous reviews and constructive suggestions. Here we tried to address the comments:

(Q1, R#1) Details about the model design. The baseline model in the ablation study refers to the pure ViT-based segmentation model without the guidance of SAM. Meanwhile, the SAM model is fixed with the three components, image encoder, prompt encoder, and the mask decoder.

(Q2, R#3) The labeled point selection. To get the single-point label for each tooth, we just randomly labels a point of one tooth, which can facilitate with the practical annotation procedure.

(Q3, R#1, R#3, R#5) Reproducibility. We will release all the codes and data to facilitate the readers with how to implement 2D and 3D cross-modality learning.




Meta-Review

Meta-review not available, early accepted paper.



back to top