Abstract

Orthodontic treatment usually requires regular face-to-face examinations to monitor dental conditions of the patients. When in-person diagnosis is not feasible, an alternative is to utilize five intra-oral photographs for remote dental monitoring. However, it lacks of 3D information, and how to reconstruct 3D dental models from such sparse view photographs is a challenging problem. In this study, we propose a 3D teeth reconstruction framework, named TeethDreamer, aiming to restore the shape and position of the upper and lower teeth. Given five intra-oral photographs, our approach first leverages a large diffusion model’s prior knowledge to generate novel multi-view images with known poses to address sparse inputs and then reconstructs high-quality 3D teeth models by neural surface reconstruction. To ensure the 3D consistency across generated views, we integrate a 3D-aware feature attention mechanism in the reverse diffusion process. Moreover, a geometry-aware normal loss is incorporated into the teeth reconstruction process to enhance geometry accuracy. Extensive experiments demonstrate the superiority of our method over current state-of-the-arts, giving the potential to monitor orthodontic treatment remotely. Our code is available at https://github.com/ShanghaiTech-IMPACT/TeethDreamer.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1038_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/ShanghaiTech-IMPACT/TeethDreamer

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Xu_TeethDreamer_MICCAI2024,
        author = { Xu, Chenfan and Liu, Zhentao and Liu, Yuan and Dou, Yulong and Wu, Jiamin and Wang, Jiepeng and Wang, Minjiao and Shen, Dinggang and Cui, Zhiming},
        title = { { TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    A method is presented to generate 3D information fo the teeth from five photographs. The suggested method is a combination of diffusion based generative models to generate multiple views, 3D CNNs to extract color and normal features, and 3D reconstruction based thereon.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -the paper presents a novel combination of processing steps for teeth reconstrution, formerly not applied to this specific area -the applicability in orhtodontic treatment is well motivated -the visual results are convincing (Fig. 3) in direct comparison to the other selected methods

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -the method is compared only to methods not specific to teeth reconstruction -about the one relevant teeth reconstruction method cited [1]: it is not shown that the limitations mentioned in the Introduction are overcome by the presented method (e.g. in Fig. 3, individual details may be incorrect between “Ours” and “GT” - meaning the generative method does not correctly reconstruct but rather “guesses” the shape details where information from more angles are missing). -Data acquisition methods not elaborated -quantitative results difficult to interpret, not clearly described how evaluated -there is no discussion section or analysis of own limitations

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Since dataset is not available or well described and complex processing steps, would be quite an effort to reproduce at another group, unclear if possible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The work is well motivated and the results make a good impression. The clarity of describing the method and especially the metrics evaluation show room for improvement. It is not clear how the work results compare to state-of-the art teeth reconstuction. There is no discussion section or analysis of own limitations.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The clarity of describing the method and especially the metrics evaluation show room for improvement. It is not clear how the work results compare to state-of-the art teeth reconstuction. There is no discussion section or analysis of own limitations. The method is demonstrated to work well visually and includes modern state-of-the art methods for this specific application.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The authors answered most of the open questions. The authors explained why they compare only to teeth-non-specific techniques. However, the weak point still remains because comparing with unspecific methods is not the best approach to rate on the performance in the planned application context. The weakness of potentially generating wrong features because of incomplete coverage was not addressed. In summary, no change to the rating.



Review #2

  • Please describe the contribution of the paper

    The article introduces a straightforward approach (TeethDreamer) that combines advanced techniques such as diffusion modeling, 3D-aware feature attention, and geometry-aware constraints to enable accurate and efficient 3D teeth reconstruction from minimal input data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    TeethDreamer utilizes a diffusion model to generate multi-view images, overcoming input data sparsity for accurate 3D models. It incorporates a 3D-aware feature attention mechanism for improved view consistency, resulting in high-quality color images and precise reconstructions. The introduction of a geometry-aware normal loss further enhances geometric accuracy, prioritizing clinically relevant outcomes. Authors evaluated image quality and reconstruction using Hausdorff Distance (HD), Chamfer Distance (CD), and IoU metrics against three baseline models.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Authors does not extensively address the variability in intra-oral photographs, such as differences in lighting conditions, occlusions, or patient-specific factors.
    2. Detail regarding the dataset is lacking. How it is acquired? does it go through any quality criteria?
    3. Will author make their code available for reproduciblity?
    4. No experts validation of the reconstructed view.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Define the abbreviation for “GT” in Figures 3 and 4?
    2. Could you provide a reference for the Hausdorff Distance (HD), Chamfer Distance (CD), and Intersection over Union (IoU) metrics?
    3. What is the purpose of the square box in Zero123 in Figure 4?
    4. The scalability and computational efficiency of the proposed framework are not thoroughly addressed. Could you provide few information on this aspect?
    5. There is a lack of information regarding the dataset. How was it acquired, and did it undergo any quality criteria?
    6. Could author clarify why the proposed approach is not compared with recent related work i.e.: reference [1]?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Refer Point 6 and 10

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Author addressed most of my comments. This paper could be accepted considering the techniques they adopted to reconstruct teeth model. However, the experiment took more time and not sure how it will be beneficial in real clinical settings.



Review #3

  • Please describe the contribution of the paper

    This work aims to improve the quality of 3D models of upper and lower teeth generated from a limited number of intra-oral photographs. The proposed approach first utilizes a diffusion model to generate new multi-view images. Subsequently, in the process of 3D tooth generation, a geometry-aware normal loss is integrated.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The combination of a diffusion model for generating new views and a geometry-aware normal loss for 3D tooth model creation represents a novel approach. Strong evaluation was performed, and advantages of the proposed method were clearly demonstrated in comparison with other 3D tooth reconstruction methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The text attention branch in figure 2 is confusing. It lacks clarity regarding what the term “text” specifically refers to in the proposed method.(minor)

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It would be helpful to provide some clarifications/explainations on what extra information the normal maps bring for 3D tooth reconstruction.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The presented method is technically novel and the application is clinically important.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the reviewers (R1, R3, R4) for their constructive comments. We are glad they appreciate the technical novelty (R1, R3), clinical important problems (R1, R3), and convincing results (R1, R3, R4). We next answer the main concerns in the revision.

Q1(R1,R4) Why not compare with task-specific methods, such as the work cited [1] A: Currently, most teeth reconstruction methods rely on CBCT images. Although the work [1] aims to reconstruct 3D teeth models from multi-view images, it requires abundant paired data which is hard to acquire, i.e., real intra-oral photos and corresponding teeth models, to construct templates for its parametric model. Therefore, it is difficult to apply it widely in practice. In contrast, our method offers greater flexibility, as it does not require paired cases for training, which is compared with SOTA reconstruction techniques in 3D vision, such as SyncDreamer, Zero123, and Neus. Our method achieves the best results, demonstrating its superiority in this task.

Q2(R1,R4) About data acquisition pipeline A: We collected intra-oral scanning models from 3200 patients at two hospitals, with 200 patients having paired 2D real intra-oral photos. And dentists manually checked the data to ensure integrity and availability for clinical diagnosis. For input images of training, we rendered four intra-oral photos of lower and upper teeth separately, simulating real intra-oral photos. Camera viewpoints were randomly set within specific ranges. To bridge the gap between real images and rendered images, we added spot lights and modified teeth mesh material to achieve a realistic highlight effect.

Q3(R1,R4) Metrics used in this paper. A: Our method involves a two-step pipeline: generating novel multi-view 2D images and reconstructing 3D tooth models from the generation. Hence, we evaluate our method using both image-level and mesh-level metrics. For image-level evaluation, we compute the common-used PSNR, SSIM, and LPIPS between the generated images and target images. For mesh-level evaluation, we employ CD, which measures the bidirectional average distance; HD, which captures the maximum distance; and IoU, which quantifies ratio of overlapping volume between the prediction and ground truth. We will provide details and reference in the final version to ensure clarity.

Q4(R4) The scalability and computational efficiency A: We employ various data augmentation techniques during the rendering process of input images. And our work has been tested on intra-oral photos collected from real-world clinics. This underscores the scalability of our framework in accommodating the variability in intra-oral photographs. Regards computational efficiency, our model is trained on a single A100 GPU for 4 days. And the reconstruction takes about 10~20 mins.

Q5(R1,R4) Limitations of our method A: One limitation is its efficiency. The efficiency of neural surface reconstruction is a wide concern in 3D reconstruction. Many researchers are working on speeding up it, e.g. instant-ngp which we will consider in future. In addition, another future work is to extract high-frequency low-level features for more precise details, improving overall quality. We will include these potential improvements in the final version.

Q6(R3,R4) Some details about results and the framework. A: The square box in Fig. 4 highlights the inconsistency between different views in Zero123, as discussed in the experiments. In Fig. 2, the “text” attention branch encodes the condition using CLIP’s image encoder in the stable diffusion model. Although initially used for text-to-image tasks, we retain the terminology for consistency, but will clarify that “text” refers to image conditions, not text prompts in the final version.

Q7(R4) The benefit of normal map Normal maps contain the normal direction of local surfaces which bring more precise geometry details for 3D teeth reconstruction.

We will release our codes and pre-trained models on our dataset.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I am inclined to reject this paper for the following reasons:

    -The motivation is not well-grounded. Remote orthodontic visits require more detailed 3D reconstructions than those generated in this study. Even if we overlook this issue, the evaluation of clinical usage should have been assessed by an orthodontist. The feasibility cannot be properly assessed in the current manner.

    -A baseline comparison with reference 1, which is the closest work in the literature, is missing.

    -There are data-related issues that need to be addressed. -How are patient variances handled, such as missing teeth? -Cross-validation should have been used. How was the test data (100 samples) selected?

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I am inclined to reject this paper for the following reasons:

    -The motivation is not well-grounded. Remote orthodontic visits require more detailed 3D reconstructions than those generated in this study. Even if we overlook this issue, the evaluation of clinical usage should have been assessed by an orthodontist. The feasibility cannot be properly assessed in the current manner.

    -A baseline comparison with reference 1, which is the closest work in the literature, is missing.

    -There are data-related issues that need to be addressed. -How are patient variances handled, such as missing teeth? -Cross-validation should have been used. How was the test data (100 samples) selected?



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The proposed a 3D reconstruction method using a diffusion model from a small number of intra-oral photographs. The rebuttal seems to have addressed most of reviewers’ questions. Although there are some weak points remained, e.g., missing performance comparison with similar SOTA methods, the proposed method seems novel and has merits for dental applications.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The proposed a 3D reconstruction method using a diffusion model from a small number of intra-oral photographs. The rebuttal seems to have addressed most of reviewers’ questions. Although there are some weak points remained, e.g., missing performance comparison with similar SOTA methods, the proposed method seems novel and has merits for dental applications.



back to top