Abstract

Accurate detection of tooth landmarks is crucial for computer-aided orthodontic treatment. Previous methods often employ segmentation to isolate individual teeth, but rely heavily on segmentation accuracy and require annotated data. In this paper, we introduce a two-stage framework for tooth localization and landmark detection, eliminating the need for segmentation based on mesh deep learning. First, we define the fuzzy tooth regions based on landmark positions. Binary masks are generated for the tooth regions located from the original jaw mesh. By combining local features of individual teeth with the global features of the jaw model, our method predicts multiple heatmaps and the corresponding probabilities of potential landmarks for each tooth. Finally, we design a bipartite matching loss for both tooth localization and landmark detection to align the prediction set with the ground truth, thereby facilitating end-to-end inference throughout the entire process. Experimental results on the Teeth3DS+ dataset demonstrate that our method effectively detects a variable number of landmarks. Furthermore, it significantly outperforms existing baseline methods, exhibiting robust generalization and superior performance.

(The code will be released at \url{https://github.com/sikingbo/ToothLDNet}.)

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3637_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/sikingbo/ToothLDNet

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ShiKai_EndtoEnd_MICCAI2025,
        author = { Shi, Kaibo and Jin, Hairong and Zheng, Youyi},
        title = { { End-to-End 3D Tooth Landmark Detection with Fuzzy Tooth Localization } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {172 -- 181}
}


Reviews

Review #1

  • Please describe the contribution of the paper
    1. This study proposes ToothLDNet, a novel two-stage framework for 3D tooth landmark detection that eliminates the need for explicit tooth segmentation.
    2. This study introduces a transformer-based network that extracts local geometric features and employs self-attention to capture the global sequence features of teeth along the dental arch curve.
    3. This study evaluates the proposed method on the Teeth3DS+ dataset, achieving robust performance across a range of challenging dental cases and demonstrating superior performance.
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A key strength of the study is its segmentation-free design, which avoids the heavy annotation burden common in 3D dental analysis.
    2. The paper includes quantitative and qualitative evaluations, ablation studies, and comparisons against multiple baselines.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. ToothLDNet depends on landmark-derived bounding boxes for ground truth localization and detects landmarks based on landmark-derived regions. This approach can cause generalization problems in cases with missing or misaligned teeth, where those initial regions might be ambiguous. Also, this approach assumes that tooth clustering (via DBSCAN) is reliable, but clustering is sensitive to density and hyperparameters, which could limit robustness in less clean data.
    2. I agree with the author’s observation that high-accuracy tooth segmentation does not significantly enhance landmark detection performance. However, tooth segmentation on 3D tooth models is crucial for orthodontic treatment and implant surgical planning in clinical practice.
    3. In the mesh simplification step, the faces of the original jaw model decreased from the number of faces from approximately 100,000 to around 10,000. Here, the geodesic distance from the face centroids to the ground truth landmarks for heatmap generation could be varied after the mesh simplification step. Also, the heatmaps of ground truth landmarks could not perfectly reflect the original coordinates of ground truth landmarks. Please check this paper [Wang, Yuan, et al. “Learning to detect 3D facial landmarks via heatmap regression with graph convolutional network.” Proceedings of the AAAI conference on artificial intelligence. Vol. 36. No. 3. 2022.].
    4. The ToothLDNet should be evaluated and analyzed on the tooth dataset, including missing and misaligned teeth.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While this paper presents a well-motivated and technically promising method, it falls short in several critical areas that must be addressed. These include details of preprocessing, generalization and robustness validation, and real-world clinical applicability.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Thank you for your detailed and thoughtful rebuttal. I appreciate the authors’ efforts in addressing the concerns raised during the review process.



Review #2

  • Please describe the contribution of the paper

    This paper proposes an end-to-end framework for teeth landmark detection. To reduce the annotation workload, the authors replace the conventional intermediate step of tooth segmentation with tooth localization. The performance of the proposed framework is evaluated on a publicly available dataset and compared against other existing models.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The motivation behind this work is clear and addresses an important challenge in dental landmark detection.
    2. The authors propose a novel method that combines global and local features to improve the accuracy of the final detection.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The clinical application of the proposed method is not well described. Providing more context on how the approach would be used in real-world clinical settings would strengthen the paper.
    2. The dataset used for training and evaluation is insufficiently described. Key details such as the number of samples, sample characteristics, and data format are missing and should be included to ensure reproducibility and proper evaluation of the results.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See Strengths and Weaknesses.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose a two-step framework ToothDLNet for tooth landmark localization and detection. This framework does not require tooth segmentation as a pre-processing step and can localize a variable number of landmarks. The authors utilize the tooth landmark (mesial, distal, inner, outer and cusp points) information to construct a rough bounding box around the tooth and create binary mask to separate the tooth from background. In the first step, global features are created for the entire jaw model. In the second step, the method extracts the local features from tooth to localize the landmarks. The authors design a bipartite matching loss to train the tooth localization and landmark detection network. Both the tooth localization and landmark detection networks are based on transformer architecture. For extracting the geometroc features, the TeethGNN method have been used. For extracting the landmarks the KeyPointDETR method have been used.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of tooth landmark localization without the tooth segmentation step is interesting. The ablation study shows the impact of the different components of the method proposed by the authors. The idea of localizing variable number of keypoints is interesting.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The paper uses DBSCAN and KNN methods and hence the runtime might be longer.

    The authors mention that the design a bipartite matching loss for training the network. But the KeypointDETR also employs bipartite matching, so is there a difference in the bipartite matching loss?

    Some of the minor comments

    1. What does the reset mean in Figure 1?
    2. In the Fig 2, the meaning of Mj and Nj are not clear.

    3. the paper mentions “MJ is the preset number of queries for tooth localization”. But the authors have claimed to “demonstrate that our method effectively detects a variable number of landmarks”. It will be helpful to clarify this part.

    4. What is the value of T in the DBSCAN process?

    5. Any insight on why the number of queries being set to 50?”The number of queries, MJ and ML , are both preset to 50”.

    6. Could you share insights on why does th proposed method performs worse compared to KeypointDETR for the distal keypoints?

    7. Since DBSCAN method is usd for clustering, it would be helpful to note the time required to complete an epoch. Since this method does tooth localization first and then tooth landmark detection, can the tooth landmarks be detected from a single segmented tooth using this method? or are the changes non-trivial?
    8. Although the method proposed in “Two-Stage Mesh Deep Learning for Automated Tooth Segmentation and Landmark Localization on 3D Intraoral Scans” requires tooth segmentation first, it would be nice to have a comparison with this method.

    Reproducibility: Since the authors did not mention about publicly releasing the code, it will be necessary to provide details of the network e.g. dimensions etc in detail so that others can replicate it.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    There is limited number of deep learning based methods for tooth landmark localization and the method proposed by the authors is innovative.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I am satisfied with the authors response to my comments. I have gone through the reviews by the other reviewers as well. I would like to add that reviewer R1 is concerned about the dental mesh simplification and potential deterioration of the landmarks. but dental mesh simplification to 10000 cells is a pretty standard process




Author Feedback

Thanks for the reviewers’ valuable comments. We will release our code and weights. R:Reviewer;MR:Meta Review;C:Comment R1C2,R3C1,MR4:(Clinical)Landmark detection holds significant clinical value in digital orthodontics, such as tooth axis estimation, occlusion analysis, and orthodontic assessment. Jaw model segmentation and landmark annotation are labor-intensive and costly. We propose the first end-to-end landmark detection framework that operates directly on jaw models, eliminating the reliance on segmentation annotations required by existing methods. This innovation addresses the challenge of missing segmentation labels in some landmark datasets and reduces the annotation cost for landmark data. R1C4,R3C2:(Dataset)We use Teeth3DS+, the only public 3D tooth dataset for segmentation and landmarks, following its official split. Due to space limits, we skipped detailed introduction in the paper. It covers many challenging cases (e.g., missing teeth, crowding, misalignment) and includes both child and adult models. More visualizations of challenging cases will be added in the final version. R1C1,MR2:(Challenging cases)For missing teeth, predicted masks with low-confidence scores are effectively filtered out to reduce the effect on landmark detection. For misaligned teeth, each box fully covers the target tooth and partially covers adjacent misaligned teeth. The network adaptively focuses on complete teeth within patches. As a result, such challenging cases have a limited effect on our performance. These advantages reflect a key innovation of our pipeline—replacing segmentation with fuzzy localization for better robustness and adaptability. R1C1,R2,MR1:(DBSCAN)DBSCAN aims to extract global tooth features and provide semantic information for landmark detection. As shown in the ablation (Sec. 3.2.3), its clustering accuracy has limited impact. Training with/without DBSCAN takes 998s/757s per epoch; inference takes ~70 ms. It can also be used as a preprocessing step to speed up training the landmark network. We predict offsets from face centroids to tooth centroids. The shifted points form dense clusters, on which DBSCAN is applied (radius=1.05,min_pts=30), enabling reliable tooth localization as demonstrated in TeethGNN. T is the number of clustered teeth. R1C3:(Heatmap)Heatmap-based methods remain widely used in 3D keypoint detection due to their superior performance(e.g.,KeypointDETR). The mentioned paper uses Euclidean heatmaps on resampled point clouds, which requires post-processing to extract landmarks from local surfaces. We use an appropriate decay factor setting for geodesic heatmaps on original mesh, ensuring sufficiently small and accurate local surfaces for unambiguous landmark localization. R2,MR3:(Bipartite matching loss)KeypointDETR pioneered the heatmap-based bipartite matching loss for 3D keypoint detection; its direct application to jaw models remains challenging. We adapt this loss function to tooth patches and combine it with the bipartite matching loss of tooth localization, facilitating end-to-end landmark detection on complicated jaw models. R2:(More clarify)Fig.1, ‘reset’ is mapping the landmarks from tooth patches back to the jaw model. Fig.2, MJ and NJ is the preset number of queries and the number of input faces for tooth localization, which are distinct from ML and NL for landmark detection, as explained in Sec.2.2. We set the number of queries in both stages to 50—sufficient to exceed the maximum number of teeth (≤16) and landmarks per tooth (≤10). We found that when the number of queries exceeds 30, the performance remains stable. We predict a fixed number of queries and filters out ‘background’ queries, enabling end-to-end inference with a variable number of tooth locations and landmarks, as shown in Fig.1. Our landmark network can be directly applied to the single segmented teeth. A comparison with TS-MDL is meaningful, and we are contacting the authors for source code and attempting to reproduce it.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    This work has received mixed reviews, therefore authors are invited to provide a rebuttal to reviewer remarks. It is suggested to organize weaknesses according to their relevance and address them in a rebuttal.

    Major issues from the point of view of this meta reviewer are: (1) Reliance on DBSCAN clustering (2) Discussion of potential robustness issues due to missing teeth and artifacts. (3) Novelty of the bipartite matching loss, which was used in several previous works (as cited in the paper and used in comparsions) (4) Relationship to potential clinical application is unclear

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top