Abstract

Automatic tooth segmentation on 3D dental models is a fundamental task for computer-aided orthodontic treatment. Many deep learning methods aimed at precise tooth segmentation currently require meticulous point-wise annotations, which are extremely time-consuming and labor-intensive. To address this issue, we proposed a weakly supervised tooth instance segmentation network (WS-TIS), which only requires coarse class labels along with approximately 50% of point-wise tooth annotations. Our WS-TIS consists of two stages, including tooth discriminative localization and tooth instance segmentation. Precise tooth localization is frequently pivotal in instance segmentation. However, annotation of tooth centroids or bounding boxes is often challenging when we have limited point-wise tooth annotations. Therefore, we designed a proxy task to weakly supervise tooth localization. Specifically, we utilize a fine-grained multi-label classification task, equipping with the disentangled re-sampling strategy and a gated attention mechanism which can assist the network in learning discriminative tooth features. With discriminative features, certain feature visualization techniques can be easily employed to locate these discriminative regions, thereby accurately cropping out the teeth. In the second stage, a segmentation module was trained on limited annotated data (approximately 50% of all teeth) to accurately segment each tooth from cropping regions. Experiments on Teeth3DS demonstrate that our method with weakly supervised learning and weak annotations, achieves superior performance comparable to state-of-the-art approaches with full annotations.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1757_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/ladderlab-xjtu/WS-TIS

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Wan_Weakly_MICCAI2024,
        author = { Wang, Haoyu and Li, Kehan and Zhu, Jihua and Wang, Fan and Lian, Chunfeng and Ma, Jianhua},
        title = { { Weakly Supervised Tooth Instance Segmentation on 3D Dental Models with Multi-Label Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed a novel weakly supervised tooth instance segmentation approach in 3D dental models named WS-TIS. In details, WS-TIS leverages fine-grained multi-level classification network to detect the location of each tooth, and uses gated attention mechanism to enhance the feature discrimination of each tooth. The segmentaion results are obatined through the cropping of the refined discrinimative features and the sampling input points.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The idea of using weakly annotation for tooth instance segmentation training is very interesting and meaningful.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In Sec.2, why is the shape of attention mask A is NC? If A is calculated through two views of f, the shape of A should be NN.
    2. How is the discriminative features with their size N1024 cropped to the dental model which has a shape of N6?
    3. The method section needs more details, many parameters are given without any further explainations. For example, M is introduced in Sec.2.1 and no further discussions about M is given in this section, until Sec.2.3. In Sec.2.1, the authors should first define M is used to obatin a subset points from the original tooth model with size N*6 and M<N.
    4. More recent approaches should be compared. As the authors claimed in their paper that their approach achieves superior performance compared to state-of-the-art approaches under full annotations, however, the baselines compared in the paper are from four years ago. [1] follows a self-supervised learning manner which is also comparable. [1] Amani Almalki, Longin Jan Latecki,”Self-Supervised Learning With Masked Autoencoders for Teeth Segmentation From Intra-Oral 3D Scans”, Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The written English should be improved. And more details need to given in the method section.
    2. Some recent methods should be compared in the experimental section.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Recent methods on weakly supervised learning and tooth instance segmentation are compared.
    2. The method section is very confusing which make it hard to follow.
  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thanks for the authors’ response. The authors provided additional explanations for the methodology section in the rebuttal, which I find sufficient and helpful for understanding the approach. My concerns have been addressed, and I believe this paper can be accepted.



Review #2

  • Please describe the contribution of the paper

    The paper proposes an approach for weakly supervised tooth identification, and segmentation from a 3D scan of the jaw and teeth.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper presents a solution to a clearly defined and medically relevant task
    • Compared to the state of the art it reduces the need for annotations.
    • Results support the claim of the paper.
    • Comparison experiments high-light the value of individual parts of the algorithm.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It is not entirely clear if the annotations are needed during test time, i.e., if the teeth have to be indicated by the user, or if this is only the case during training time. I assume that it is the case only during training, and that the contribution of the paper is an algorithm that can learn from 3D data with weak annotations of teeth (i.e., only one identifier) instead of fully segmented teeth. Please clarify.
    • It would be helpful if the method explanation would provide a red line of variables and equations to follow through the processing for the training and the inference. At its current stage large parts are narrative making the understanding of the algorithm and the specific novel contributions harder.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Please also have a look at answers to question 6 (weaknesses)
    • Structure the paper to facilitate reading and following the main method description: first list the rationale, limitations of prior approaches, motivation of your approach. Then in the method section only describe the method in one go from training data to model, and from input to inference output. Please use variables for the important data here, to facilitate following the explanation.
    • Figure 3: why no labels of the teeth?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes an interesting approach, and the results support the claims. However the writing could be improved to clarify the method description. This is currently hard to follow, as it is in part narrative, and mixes reasoning and comments with the actual method description.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    Based on the author’s claim, they have developed the first weakly supervised tooth instance segmentation network that can be trained with limited labeled data and achieve state-of-the-art performance. Extensive experiments demonstrate the advantages of this proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well orginized and figures are clear.
    2. The proposed method successuflly achieve promissing performance with weakly supervised methods.
    3. The experiments and ablation studies are sufficient to show the improvement from proposed methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Overall, the paper is well-structured and the experiments are robust. However, I have a question regarding the dataset used. The author tests the proposed method on a public dataset, which I assume includes ground truth labels for each case. I am curious about how the author simulated the ‘limited label’ scenario. It would also be beneficial if the author could demonstrate the segmentation performance at various label ratios (For example, 5%, 20% and 50%).
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See the weakness.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty is good and the experiments are sufficient.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The author answer my questions.




Author Feedback

We thank in-depth reviews and appreciate for affirming our contributions. The main concerns are addressed below. [R3]-Annotations during testing Yes, annotations are only required during training. During testing, we only need to use the complete tooth model without annotations as input. [R4]-Comparison with existing methods The competing methods we have selected are all open-sourced and highly representative in this field. Most teeth segmentation methods compared their results with these methods. We set up the comparison experiments with the same configuration to ensure a fair comparison. We have reviewed recent methods, but we did not include them in our comparative experiments as they did not release their codes. Notably, according to their reported results on the same dataset (Teeth3DS), our method qualitatively remains superior. We’ll update the paper to discuss these point more clearly. [R3]-Tooth labels in Fig. 3 Sorry for any confusion. To clearly show the readers which tooth we have localized, we only displayed the corresponding tooth by masking out the other teeth in Fig.3. [R3, R4]-More explanations of our methods Sorry for the unclear description of our method. Assuming we have weakly annotated training data, we get sampling points that can be denoted as x∈R^(N×6) after disentangled resampling (not required during testing). We use a classification task to assist in learning the discriminative features of each category. Specifically, we get a high-level representation f∈R^(N×1024) through the two stream feature extractor. Then, an attention mask can be calculated based on Eq.1. The discriminative feature of different categories f ̂ can be obtained based on Eq.2. For each category j, discriminative localization techniques (e.g., CAM and Grad-CAM) can be easily employed to highlight the discriminative points on each category which can be used for cropping (please refer to Details of cropping). The cropped region of each category which can be denoted as x_j∈R^(M×6) (M<N) will be feed into segmentation network for the point-wise prediction. Each category’s segmentation results will be integrated to form the final result. Attention mask A: To make the features more discriminative for different tooth categories, we hope that A can highlight the association of each point with its corresponding tooth category. Based on Eq. 2, the f ̂ will learn more distinctive discriminative features for different teeth, which benefits our discriminative localization. A is indeed computed from two views of f, but ω is a mapping from N to C, thus A∈R^(N×C). Details of cropping: Based on the gated attention mechanism, f ̂∈R^(N×1024×C) (C stands for teeth categories) learned discriminative features for different teeth. For each category j, we use GAP to integrate spatial information (F_j∈R^(1×1024)) and mapping (from 1024 to 1) to output the probability of this tooth. Similar to CAM, the weights of mapping will emphasize the important channels of each point in f ̂_j. In this way, the discriminative points belonging to the category j are highlighted, which can be used for cropping. We apologize for the unclear description and will update the paper to clarify the method description. [R5]-Explanation of dataset Certainly, we simulated the ‘limited label’ scenario on a fully annotated dataset by randomly masking some teeth as background class. [R5]-Different label ratios Before submitting, we conducted ablation experiments with different label ratios (i.e., 20%, 30% and 50%). We find that the segmentation performance is still acceptable even though we have only 30% labeled data. In contrast, the performance is significantly affected when the label is reduced further. Due to space limitations, we didn’t include this experiment in our ablation study. Instead, we only presented the best results with limited annotations (i.e., 50%). [R3, R4, R5]-Reproducibility For sure, we’ll release the code on GitHub after the review process for reproducibility.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This meta-reviewer has two concerns about the paper: 1) the competing methods seem to perform extremely poorly, even in normal cases. The paper should compare with SOTA methods and incudes more challenging cases; 2) although this method has been evaluated on the public dataset, why has it not been compared with the state-of-the-art methods (i.e., the leaderboard)?

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This meta-reviewer has two concerns about the paper: 1) the competing methods seem to perform extremely poorly, even in normal cases. The paper should compare with SOTA methods and incudes more challenging cases; 2) although this method has been evaluated on the public dataset, why has it not been compared with the state-of-the-art methods (i.e., the leaderboard)?



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal has provided sufficient explanations for their methodological design and implementation details. The proposed method is interesting and inspiring, and the performance is promising. The final version should include more descriptions of the adopted dataset and the annotations required to conduct inference.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The rebuttal has provided sufficient explanations for their methodological design and implementation details. The proposed method is interesting and inspiring, and the performance is promising. The final version should include more descriptions of the adopted dataset and the annotations required to conduct inference.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    One reviewer upgraded his raring from R to WA, and three ratings become positive. The authors’ responses convey responses to a number of questions from the reviewers. I would follow all reviewers’ recommendations for this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    One reviewer upgraded his raring from R to WA, and three ratings become positive. The authors’ responses convey responses to a number of questions from the reviewers. I would follow all reviewers’ recommendations for this paper.



back to top