Abstract

Generating high-precision 3D dental data is crucial for clinical practice, virtual simulation, and education. However, it is challenging to synthesize smooth and detailed tooth models. In this work, we introduce DuoDent, a dual-stream diffusion-based framework for the synthesis of accurate 3D tooth point clouds followed by a refined mesh generation. Our framework combines Transformer-based diffusion and CNN-based diffusion to capture both global dental structures and fine local features, thereby enhancing surface detail while reducing artifacts such as staircase and rough textures. The generated point clouds are optimized using normal consistency constraints for proper alignment of surface normals, which is key to high-quality mesh reconstruction. In addition, we apply a normal estimation with orientation consistency to the generated point clouds prior to converting them to output meshes, which enables the generation of smoother and anatomically precise tooth models. Extensive experiments validate that our method not only outperforms existing approaches in quantitative metrics but also delivers superior qualitative results, demonstrating its potential to significantly improve tooth modeling in dentistry. Our code is available at https://github.com/kdy-ku/DuoDent.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1137_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/kdy-ku/DuoDent

Link to the Dataset(s)

N/A

BibTex

@InProceedings{KwoDoe_DuoDent_MICCAI2025,
        author = { Kwon, Doeyoung and Kim, Seongjun and Song, In-Seok and Baek, Seung Jun},
        title = { { DuoDent: Tooth Generation using Dual-Stream Diffusion with Normal Consistency } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {183 -- 193}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes an approach to generate geometric teeth shapes in the mesh representation. To that end, authors propose a dual-steam diffusion approach: a transformer-based diffusion stream generates the global shape of the teeth, and a CCN-based diffusion stream generates local shape variations, in the form of a point-cloud. The training of these networks includes a novel Normal Consistency Constraint (NCC) loss. A final post-processing is proposed to obtain a mesh from the predicted point-cloud. The method evaluates on own-private data and outperforms the state of the art on the task of novel teeth geometries generation.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The dual-stream approach (transformers for global shape and CNN for local shape) is interesting
    • The NCC loss is sound and, as shown in the ablation, improves the quality of the result
    • the paper is in general well written and easy to follow
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The major weakness of the paper is the lack of precise application. Authors write at the end of the experiments: “…3D dental models suitable for clinical and educational applications”. However, no clinical nor educational applications are explicitly mentioned or evaluated. It is unclear to the reader in which case the proposed approach could be used. For example, teeth generation is required in implant design. In this context, the shape of the neighboring teeth needs to be considered to ensure a good compatibility. Similarly, to obtain a good occlusion, the shape of the opposite teeth are key. While there are clear use cases, none is mentionned / evaluated. It is thus unclear how solely generating teeth conditioned to the teeth number could improve practice or education.

    • The second major drawback is the lack of evaluation in publicly available data (https://toothfairychallenges.github.io/) nor references to them. The reproducibility of the paper is thus not possible. Authors do not mention “data sharing” nor “code sharing”.

    • While the evaluated predicted teeth improve state of the art on several metrics, the resulting teeth seem overly smooth: cusps are not visible. Only 3 examples are presented in Fig. 2 and Sup Mat space (images or videos) was not used.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • it seems the prediction is done in a normalized space [-1,1]^3 How is the metric accuracy then tested (with Chamfer?)

    • what are the units of the reported Chamfer distance? Tab 1 and 2 ?

    • In the evaluation metrics: although the same metrics as 16/17/18 are used, it would be useful to recall the tasks they are being evaluated on. “pointcloud generation studies [16-18]”

    • It would be good to mention the training and inference times, as diffusion models are usually slow.

    • For the evaluation it would be good to cross-validate (10-fold in this case). Is there a reason (computation time) not to do it?

    • What about including small non-normative details such as broken teeth?

    • For the evaluation of the tables, it would be good to follow statistical analysis and intervals as suggested by [A]

    Small improvement:

    • Fig 1: adding the notation of the method (Z_t, X_t, …) in the figure would help the reader to better follow the approach.

    • I would suggest to move the discussion on the benefits of each part (middle of Qualitative results) to the part after the Ablation. It is with the ablation results that the these claims are backed up.

    [A] Christodoulou, Evangelia, Annika Reinke, Rola Houhou, Piotr Kalinowski, Selen Erkan, Carole H. Sudre, Ninon Burgos et al. “Confidence intervals uncovered: Are we ready for real-world medical imaging AI?.” In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 124-132. Cham: Springer Nature Switzerland, 2024.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I’m very hesitant with this paper. While the approach is sound (transformers for global and cnn for local) and the normal loss is interesting and beneficial, the lack of a clear task in which the method shows its benefits clearly limits the contribution. Also the lack of reproducibility makes me lean towards rejection, as publicly available datasets exist. They should be used at least for the comparison of methods and enable future research to compare to this method.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    After reading the Authors’ rebuttal I lean towards acceptance. The rebuttal addressed my main concerns:

    • The execution of the public dataset was done in a short time, showing that result could even be higher if trained/finetuned on the challenge dataset.
    • also, the publication of code is going to advance the community for fair comparisons
    • the “healthy/normal population” is mentionned
    • statistical significance study as [A] is going to be added

    I encourage authors to implement everything they promised in the rebuttal.



Review #2

  • Please describe the contribution of the paper

    This paper introduced DuoDent, a dual-stream diffusion-based framework for the synthesis of accurate 3D tooth point clouds followed by a refined mesh generation.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A method for 3D point cloud representation of teeth followed by a refined mesh generation.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The manuscript lacks a comprehensive review of related work (for example diffusion-based 3D point cloud representation method, mesh reconstruction), which makes it difficult to clearly identify the novelty and contribution of the proposed method.
    2. The Transformer-based diffusion branch is not clearly described, and no references are provided to clarify its implementation details.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The manuscript lacks a comprehensive review of related work (for example diffusion-based 3D point cloud representation method, mesh reconstruction), which makes it difficult to clearly identify the novelty and contribution of the proposed method.
    2. The Transformer-based diffusion branch is not clearly described, and no references are provided to clarify its implementation details.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Most of my comments were satisfactorily addressed. While there is still room for future work (Practical application scenarios, interference between adjacent tooth, and algorithmic clarity), the paper meets the standards for acceptance.



Review #3

  • Please describe the contribution of the paper

    The paper introduces DuoDent, a 3D mesh generation method for teeth. This approach combines two types of diffusion models to learn both local and global features, which are essential for producing clean and smooth teeth meshes. The method operates in two stages: it first generates a point cloud (PC) that is then refined in the second stage to produce a mesh shape. The authors emphasize the necessity of using normal consistency constraints during training to achieve smooth and clean results. They also highlight the importance of orientation consistency for accurate normal estimation when converting the generated point cloud into a mesh. The authors validate DuoDent on a dataset containing ~2K tooth samples, comparing its performance to four previous methods and demonstrating superior performance in this specific task.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1 - The combination of transformer-based and CNN-based diffusion models to build a feature vector that encodes both global and local features is a good approach, supported by the results of the ablation study in Table 2.

    2 - Both quantitative and qualitative results demonstrate that DuoDent outperforms previous methods.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1- The methodological contribution is present but somewhat limited. DuoDent integrates multiple components from existing methods, which is not inherently negative. However, the paper suggests that the authors primarily assembled various techniques from the literature with minor modifications, such as adding normal consistency to the loss function and pre-processing the generated point cloud with orientation consistency before using Point2Mesh [8]. This leaves the reader uncertain about which parts of DuoDent were developed by the authors and how effective are these parts.

    2- The choice of the initial watertight mesh appears crucial for achieving good results, raising several questions: How did the authors select the initial watertight mesh? How robust is DuoDent to poor initializations? And how does the method minimize bias towards the chosen template?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Minor comment:

    • The sentence has an extra “the” in: “The NCC loss is applied to the estimated output at all the diffusion timesteps as follows”
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is clear and solve an important problem (shape generation for teeth) that would benefit the community. The proposed method is deemed enough contribution to MICCAI.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I will keep my recommendation as before the rebuttal. The authors replied in a somewhat satisfactory manner to my comments. The novelty aspect in my opinion is still limited but the application shown by the authors is an interesting one and could be beneficial to the community.




Author Feedback

We thank the reviewers for constructive feedback. Below are responses to weaknesses(W) and comments(C) from reviewers(R1,R2,R3).

(COMMON QUESTIONS: R1,R2) “Reproducibility” We will publish the full code and model checkpoints.

R1. (W1) “Lack of application” The following are potential downstream applications. 1.Dental simulators: DuoDent can support VR simulators with HMDs and haptic devices for dental treatment/surgery education(e.g., Virteasy or Simodont). It can generate anatomically accurate teeth in virtual patients(meta-human). We plan to extend DuoDent to generate diseased or complex surgical cases using LLMs. 2.Implant planning support: DuoDent currently uses tooth numbers as the condition for generation. We plan to add conditions like adjacent and opposing teeth morphology. This would support more anatomically accurate generations. We thank the reviewer for the suggestion. 3.Medical illustration: Instead of relying on non-experts to draw illustrations of oral structures, DuoDent can generate structurally precise 3D teeth. Its illustrations can be used in textbooks, presentations, and educational videos.

(W2) “Using public dataset” We agree with R1 on the importance of public datasets. We will report the comparison results using public datasets in the final paper. As R1 suggested, we examined MICCAI 2024 challenge dataset, and found our CBCT data is similar to MICCAI data, e.g., the same resolution of 0.3mm^3. Thus, we can evaluate the generated outputs from ours and baseline methods on the MICCAI test dataset. As shown below, DuoDent outperformed all the baselines: (NOTE BELOW IS A METRIC EVALUATION APPROVED BY AC) ——–|CD(↓)|EMD(↓)|N.C(↑)|F-Score(↑) LION |0.885| 0.829 |0.816 |0.594 SLIDE|0.697| 0.621 |0.894 |0.882 DiT3D|0.711| 0.629 |0.914 |0.881 PVD |0.637| 0.631 |0.913 |0.823 Ours |0.629| 0.597 |0.919 |0.891

(W3) “Visual details” Our NCC and orientation consistency modules are designed to preserve fine anatomical structures like cusps. Thus, we will add 30-40 more cases to Supp. Mat. as suggested.

(C1-3) “Details on evaluation metrics” Metrics are computed in ‘mm’ after denormalization, and tables report the real distances. In the final version, we will clarify the purpose of each metric. (C4) “Training/Inference time” Training(1700 epochs, 4×A100): ~5 days; Inference(1×A100): ~1 day (C5) “Cross-validation” Due to high training cost, we used an 8:1:1 randomized split. (C6) “Atypical cases” Currently trained on normal teeth; extension to atypical cases is planned. (C7) “Statistical analysis” We will follow [A] by adding analysis and confidence intervals. (C8-9) We will follow the suggestions, thank you.

(COMMON QUESTIONS: R2.W1,R3.W1) “Comprehensive review of related work and novelty” 1.Related work: We will expand related work to highlight our novelty. For point cloud diffusion, we will add discussions on methods such as PointDif, PCCDiff; for mesh reconstruction, we will add discussions on methods such as MVSDF, SAL. 2.Novelty: DuoDent is an end-to-end dual-stream diffusion that jointly trains and integrates DiT-3D[17] for transformer-based with PVCNN[13] for CNN-based diffusion. For mesh reconstruction, we extend Point2Mesh[8] with novel timestep-wise NCC loss and orientation consistency. DuoDent improves upon methods like LION25, SLIDE16, PVD26.

R2.W2 “Describe Transformer-based branch” Our transformer-diffusion branch uses the default implementation of DiT-3D [17], as stated in Sec.2.2 and Fig.1. DiT-3D consists of Transformer blocks with 3D patch and positional embeddings, using self-attention. We will add more architectural details to the paper.

R3.W2 “Initial mesh” We used a convex hull-based initial mesh iteratively deformed via a CNN self-prior. This optimization approach reduces initialization dependency and ensures watertightness. We will clarify this in the final version.

(C) “Wording” We will remove the redundant ‘the’.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal addressed most of the major concerns from reviewers. I am good to recommend this paper to be accepted at MICCAI’25.



back to top