List of Papers Browse by Subject Areas Author List
Abstract
Accurate detection of anatomic landmarks is essential for assessing alveolar bone and root conditions, thereby optimizing clinical outcomes in orthodontics, periodontics, and implant dentistry. Manual annotation of landmarks on cone-beam computed tomography (CBCT) by dentists is time-consuming, labor-intensive, and subject to inter-observer variability. Deep learning-based automated methods present a promising approach to streamline this process efficiently. However, the scarcity of training data and the high cost of expert annotations hinder the adoption of conventional deep learning techniques. To overcome these challenges, we introduce GeoSapiens, a novel few-shot learning frame- work designed for robust dental landmark detection using limited annotated CBCT of anterior teeth. Our GeoSapiens framework comprises two key components: (1) a robust baseline adapted from Sapiens, a foundational model that has achieved state-of-the-art performance in human-centric vision tasks, and (2) a novel geometric loss function that improves the model’s capacity to capture critical geometric relationships among anatomical structures. Experiments conducted on our collected dataset of anterior teeth landmarks revealed that GeoSapiens surpassed existing landmark detection methods, outperforming the leading approach by an 8.18% higher success detection rate at a strict 0.5 mm threshold-a standard widely recognized in dental diagnostics. Code is available at: https://github.com/xmed-lab/GeoSapiens.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0035_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/xmed-lab/GeoSapiens
Link to the Dataset(s)
N/A
BibTex
@InProceedings{WanAnb_GeometricGuided_MICCAI2025,
author = { Wang, Anbang and Elbatel, Marawan and Liu, Keyuan and Lin, Lizhuo and Lan, Meng and Yang, Yanqi and Li, Xiaomeng},
title = { { Geometric-Guided Few-Shot Dental Landmark Detection with Human-Centric Foundation Model } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15964},
month = {September},
page = {197 -- 206}
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper presents GeoSapiens, a few-shot learning framework for detecting dental landmarks from cone-beam computed tomography (CBCT) images. Utilizing Sapiens, a vision transformer-based foundational model originally designed for human-centric vision tasks, with parameter-efficient fine-tuning through LoRA to reduce computational load significantly. A custom geometric loss function enforcing spatial relationships among landmarks to ensure anatomically consistent landmark detection.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The proposed method achieves superior performance, especially at clinically relevant precision thresholds, which are meaningful for dental diagnostics.
- Geometric loss contributes meaningfully to improving model performance, which is demonstrated through ablation studies.
- The incorporation of LoRA reduces computational complexity (from 330M to 24M parameters) without substantial performance degradation.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Although geometric loss and dataset are new, the use of foundation models, transformers, and LoRA fine-tuning are relatively standard techniques in current deep-learning research. The overall methodological novelty remains moderate.
- The study lacks validation of model robustness under varying image quality (metal, motion artifacts), imaging conditions, or potential annotation variability, leaving unanswered questions regarding real-world practice.
- The details are missing in Equations (p_hat in Equation 1, q in Equation 2, and j and k in Equation 3)
- I think that the λ is too small. Therefore, geometric loss may not affect network training. Additional experiments are needed for λ.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Although the study proposes a novel loss function to improve the performance of landmark detection, the overall methodological novelty remains moderate.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
Thank you for your detailed and thoughtful rebuttal. I appreciate the authors’ efforts in addressing the concerns raised during the review process.
Review #2
- Please describe the contribution of the paper
The authors propose a method/framework for landmark detection from anterior CBCT images. Their proposed method is based on the pretrained foundation model Sapiens and uses LoRa to reduce the number of trainable parameters. The authors introduce a novel geometric loss function by taking into account the line connecting the apical point and the crown point, as well as other lines perpendicular to the line connecting the apical point and crown point. The authors adopt the a differentaible method - soft-argmax for landmark regression instead of the argmax which is not differentiable. The author leverage the dot products for the parallel lines and orthogonal straight lines formed by the landmarks in the geometric loss.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The key contribution of the paper is introducing a framework for tooth landmark detection with smaller number of trainable parameters and a custom geometric loss aimed at capturing tooth geometry better. The authors
-
introduce a geometric loss for the task of tooth landmark detection from CBCT images.
-
design a framework by combining the pretrained sapiens framework which is trained on human poses, LoRA method and the new geometric loss to achieve a performance better than the state-of-the anatomical landmark detection method GUNet and FM-OSD.
-
perform ablation study to show that the geometric loss improves the Successful Detection Rate for the sapiens model with LoRA and without LoRA.
-
introduce a new dataset. It is not publicly avbailable yet, though the authors mention of releasing it in the future.
-
The ablation shows that the proposed method utilizes LoRa and Sapiens backbone with geoemtric loss to achieve similar performance compared to the original sapiens method (330M params) with just 24M params.
-
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
-
The title did not go well with the content of the paper. I did not understand why the authors mention few-shot landmark detection in their title when they are not using few-shot learning (or if they did, it was not clearly communicated in the paper). Similarly the sentence “Nevertheless, all current approaches have shown limited success in few-shot dental landmark detection.” did not make much sense to me.
-
How are the lines connecting the landmarks are orthogonal to each other, is that how those landmarks are chosen? but then, are they true landmarks (carrying actual physical/medical significance)??
-
The authors mentioned that they established the first anterior teeth cbct dataset, but the dataset is not yet public, though authors claim that the dataset will be released. So, I am not sure if that could be counted as a contribution. Similarly, the second contribution is more about finding out the shortcomings of existing models and hencce utilizing a strong foundation model (already existing) Sapiens for the tasks landmark detection. Only the third contribution seems to be a valid contribution.
-
The authors mention that the doctors selected slices with a correct mid-sagittal plane section of the teeth. This seems to be an additional manual step.
-
It was not clear why the lines at 1/2 and 1/3rd of AP-CP line were chosen? there are a totla of 16 points per images as mentioned? Are these all considered keypoints? If not, these annotations might be considered additional annotations required for this method to work
-
How does the normalization factor ensure that the first term in equation (3) contributes equally to the total loss?
-
From the abstract it was not clear that the proposed method acheives a performance similar to the sapiens method with much lesser number of parameters and that LoRA method was used.
minor comments/questions Fig 1.What is AP-CP mentioned in the caption? Are all the red dots considered keypoints in figure 1?
-
- Please rate the clarity and organization of this paper
Poor
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
I think the introduction of the novel geometric loss and the framework consisting of the sapiens and the LoRA framework is intereating. However, the abstract/title/writing did not convey clearly the strengths of the paper.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
I am satisfied with the rebuttal provided by the authors.
Review #3
- Please describe the contribution of the paper
The paper introduces GeoSapiens, a few-shot learning framework for dental landmark detection on CBCT images. Its main contributions are:
Adaptation of Sapiens: Uses a human-centric foundation model with LoRA fine-tuning for efficiency.
Geometric Loss: Introduces a loss function enforcing perpendicularity and parallelism among anatomical lines to capture dental geometry.
LDTeeth Dataset: Provides a new dataset with 16 annotated landmarks per image for anterior teeth.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper introduces a geometric loss that explicitly encodes perpendicularity between the tooth axis and diagnostic lines, and parallelism among those lines. This is novel because most landmark detection models rely solely on point-wise distance losses (e.g., MSE), while this loss incorporates global anatomical structure, improving spatial coherence and interpretability. The authors adapt the Sapiens model—originally trained for human-centric vision—to CBCT images using Low-Rank Adaptation (LoRA), This is a transfer of pretrained knowledge to a new domain with minimal compute cost (reducing trainable parameters from 330M to 24M). They introduce LDTeeth, a dataset with 347 CBCT images and 16 detailed anatomical landmarks per image, annotated with expert guidance and clinical logic.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Limited Clinical Validation:Including qualitative evaluations by practicing clinicians or retrospective analysis on treatment decisions would strengthen this aspect. Focus Restricted to Anterior Teeth: Broader anatomical coverage or multi-region generalization experiments would increase the clinical impact. Hyperparameter Sensitivity Not Explored:A brief study on how these values (geometric loss weight λ,the impact of the temperature parameter T in soft-argmax) influence model stability and accuracy would help assess robustness.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The authors claimed to release the source code and/or dataset upon acceptance of the submission.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
This paper makes some contributions to medical image computing by:
Proposing a novel geometric loss that improves dental landmark accuracy.
Efficiently adapting the Sapiens foundation model using LoRA.
Introducing the LDTeeth dataset for anterior teeth CBCT.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
I think they address some questions.
Author Feedback
Dear Area Chairs and Reviewers,
We would like to thank all reviewers for their constructive feedback. We acknowledge the need to clarify several points raised by the reviewers, which we address below.
1) Novelty and contribution (R1, R3): We are encouraged by acknowledgements of novelty of our geometric loss that “most landmark detection models rely solely on point-wise distance losses, while this loss incorporates global anatomical structure, improving spatial coherence and interpretability (R2)” and it “contributes meaningfully to improving model performance (R1)”. We confirm that our LDTeeth dataset, the first anterior teeth CBCT dataset for dental landmark detection, will be made publicly available upon acceptance. Besides, while Sapiens is an existing foundation model, its adaptation to the challenging domain of dental CBCT imaging constitutes a significant contribution by establishing a new, strong, and practical baseline.
2) Hyperparameters λ and T (R1, R2, AC): The MSE loss in our heatmap-based regression is computed over all pixels in the output heatmap, where the vast majority correspond to easily identifiable background regions (i.e., ground truth value 0). With only 16 landmarks present, the averaged MSE becomes naturally small (~5×10⁻⁴). In contrast, our geometric loss imposes a global structural constraint across landmarks, yielding values >0.5 (Fig. 3b). To balance their magnitudes and ensure stable gradient flow during optimization, we assign a small weight to λ. For T, larger values overly smooth predictions, collapsing coordinates toward the center. Empirically, T = 0.1 was effective with minimal sensitivity nearby. We appreciate your suggestion and will include this discussion in the final camera-ready version.
3) Selection of CT slices (R3, AC): Using CBCT viewing software, clinicians efficiently select the most representative 2D sagittal slices with high precision, consistent with prior orthodontic studies. Subsequently, they undertake a time-consuming process to manually annotate 16 detailed anatomical landmarks on these 2D slices for accurate morphometric measurements. While automated 3D slice selection is a desirable preliminary step, this study intentionally focuses on the more labor-intensive task of 2D landmark annotation. In our LDTeeth dataset to be released, we provide the full 3D volume, selected sagittal slices, and the annotations for 16 anatomical landmarks per slice.
4) Medical significance for landmarks annotations (R3): The 16 landmarks we used (shown in red) were not chosen arbitrarily or for geometric convenience. They follow well-established clinical protocols in orthodontics, periodontology, and implantology. Crucially, these landmarks derive key clinical measurements that directly inform diagnosis and treatment planning. The three horizontal levels—apex, apical third, and mid-root—are standard for assessing alveolar bone thickness. For instance, if bone thickness is insufficient at the apical third, bodily tooth movement (translation) can be risky. Therefore, our landmark selection is both anatomically grounded and clinically meaningful.
5) Clinical Validation (R1, R2): We appreciate your valuable suggestion and agree that retrospective analysis and the inclusion of more varied image qualities would further strengthen the paper. This study mainly focused on the anterior region, which is most commonly evaluated in clinical practice. We acknowledge those limitations and plan to expand both the dataset and the approach to include posterior teeth and broader clinical scenarios in future work.
6) Organization and Few-shot setting (R1, R3): We adopted a few-shot setup using only 3 patients for training, as shown in Table 1. We acknowledge the confusion made, and we’ll modify accordingly in the camera-ready version upon acceptance.
We hope our clarifications will help readjust the ratings to accept GeoSapiens.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
This work is on anterior teeth landmark detection from CBCT images, introducing as contribution a geometric loss and a novel dataset.
Reviewers have stated a number of weaknesses. Authors are invited to organize weaknesses by relevance and formulate a rebuttal.
Some especially important aspects in my opinion are: (1) confusion (and lack of explanation details) about this task being a 2D or a 3D application. Since CBCT data is 3D, performing this landmark detection in 2D (as indicated in Table 1, input images are 2D, also the foundation model that is fine-tuned is trained on 2D images) would require first a slice selection stage for each tooth, which is prone to mistakes. This is not discussed at all. Alternatively, the approach could be applied to all slices of a 3D dataset. However, in this case the dataset does not contain any images, where no landmarks should be detected (e.g., neighboring slices), so we can not assess if there would be false positives on neighbouring slices, or even landmarks of a single tooth detected in different slices of the CBCT due to imaging in an oblique manner. For practical use in clinical applications, this is important. Neither the algorithm, nor the dataset reflects this problem. (2) the lack of hyperparameter studies, especially for the loss weighting term, which seems to be very small according to the MSE loss
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
Overall, I would like to accept this paper. The proposed Geometric loss is well designed for the target task (dental landmark detection). Leveraging the foundation model and LoRA is useful. However, since only one dataset is involved in the evaluation, the scope of the proposed frame work might be limited.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A