Abstract

Dental development assessment (DDA) is crucial for orthodontic diagnosis and treatment planning. Recent advances in deep learning have shown promising results in dental image analysis tasks. However, the study of dental development staging, particularly in pediatric dental development, remains underexplored. This is primarily attributed to the scarcity of publicly available datasets.In this paper, we present a pediatric Dental Development Staging Dataset(DentalDS). To the best of our knowledge, this is the first publicly available dataset for pediatric DDA. It comprises 2,583 orthopantomogram (OPG) images, with a total of 18,081 annotated teeth. Furthermore, we propose a dental development staging network (DDSNet) designed to address the classification of tooth development stages. In DDSNet, we propose Region-Instance CrossAttention (RICA) block and Multi-Expert Collaborative Classification (MECC) block to enhance the fine-grained feature fusion and classification accuracy of dental development stages. To evaluate the effectiveness of the proposed DDSNet, we conducted experiments on the DentalDS. Our proposed method achieves the state-of-the-art accuracy of 76.3% and an F1-score of 77.1%, outperforming the existing approach method by 1.9% in accuracy and 3.8% in F1-score. To facilitate further research in pediatric orthodontic treatment, code and dataset will be available at https://github.com/ybupengwang/DDSNet

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0115_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{WanPen_Towards_MICCAI2025,
        author = { Wang, Peng and He, Along and Wang, Anli and Zhou, Zhenhuan and Guan, Xiaohang and Li, Tao},
        title = { { Towards Automated Pediatric Dental Development Staging: A Dataset and Model } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15972},
        month = {September},
        page = {605 -- 615}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The contribution of this study is the introduction of DentalDS, the first publicly available dataset specifically curated for pediatric dental development assessment (DDA). It includes over 2,583 orthopantomogram (OPG) images aged from 3 to 15 years and more than 18,081 annotated teeth. In addition to DentalDS, a novel deep learning model called DDSNet is proposed, an end-to-end architecture that includes two notable modules: the Region-Instance Cross-Attention (RICA) block and the Multi-Expert Collaborative Classification (MECC) block.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    One of the strongest aspects of the study is the clinical relevance of the problem. Pediatric dental development staging is essential for orthodontic treatment planning. The dataset is meticulously annotated by dental experts through a multistep quality-controlled process, ensuring high reliability. On the technical side, DDSNet effectively combines the strengths of CNNs (local detail) and Transformers (global context) through the RICA block and demonstrates performance gains via dynamic expert collaboration in the MECC module.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The performance of DDSNet may be dependent on the segmentation quality of DETR. Some patients have only permanent teeth or both permanent and primary teeth. The DETR appears to localize exactly seven permanent mandibular teeth (central incisor to second molar) per image using DETR. However, real-world pediatric patients often have missing, extra, or unerupted teeth, and the model’s flexibility to handle such variance is not discussed or tested.

    The results of DDSNet should be compared with previous works [1,2,3]. [1] Ong, Seung-Hwan, et al. “Fully automated deep learning approach to dental development assessment in panoramic radiographs.” BMC Oral Health 24.1 (2024): 426. [2] Kurt, Ayça, et al. “Evaluation of tooth development stages with deep learning-based artificial intelligence algorithm.” BMC Oral Health 24.1 (2024): 1034. [3] Savaştaer, Ertuğrul Furkan, Berrin Çelik, and Mahmut Emin Çelik. “Automatic detection of developmental stages of molar teeth with deep learning.” BMC Oral Health 25.1 (2025): 465.

    I recommend providing additional results for each development stage (A to H) in tooth types (incisors, canines, premolars, and molars).

    Additionally, some architectural components like self-attention blocks and gating network are needed to analyze more deeply, where visualizations or interpretability analysis (e.g., attention maps or Grad-CAM) would help.

    I recommend reporting inter-rater agreement (e.g., Cohen’s kappa) during dataset creation, which would help quantify the consistency and reliability of ground truth annotations.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This study makes a valuable contribution to the field of pediatric dental assessment by releasing DentalDS, the first publicly available dataset focused specifically on pediatric dental development staging. This study also proposes DDSNet, a thoughtfully designed deep learning architecture that incorporates Region-Instance Cross-Attention (RICA) and Multi-Expert Collaborative Classification (MECC) blocks to enhance classification accuracy.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Thank you for your detailed and thoughtful rebuttal. I appreciate the authors’ efforts in addressing the concerns raised during the review process.



Review #2

  • Please describe the contribution of the paper
    1. The paper provides a dataset named DentalDS specifically designed for pediatric dental development assessment (DDA). The dataset comprises 2,583 orthopantomogram (OPG) images with a total of 18,081 annotated teeth.
    2. The proposed DDSNet model enhances the classification accuracy of tooth developmental stages in OPG images by effectively integrating local and global features, along with a dynamic expert selection mechanism
    3. The study includes thorough evaluations against other dental development staging methods, along with ablation studies.
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper provides a new pediatric dental dataset, DentalDS, which was collected from real clinical environments and meticulously screened and annotated by experienced dental experts, ensuring high data quality and clinical applicability.
    2. The proposed baseline model, DDSNet, introduces two key modules: RICA for extracting both local and global features, and MECC for dynamic expert selection. This combination enhances adaptability and classification accuracy in complex tasks.
    3. The evaluation is comprehensive, including comparisons with other methods across four metrics, along with ablation studies on key components.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Table 1 attempts to show how the new dataset addresses existing limitations, but the argument is insufficient. For instance, Ong et al.’s dataset has a larger scale and similar tasks. Simply citing “unavailability” as an advantage is inadequate.
    2. The rationale for selecting only seven tooth instances as analysis targets is inadequately justified. If these teeth are from the same quadrant, the authors should clarify: (1) why a specific quadrant was chosen over full-mouth analysis, and (2) whether there was bias in choosing the specific sample.
    3. Tables 2-5 in the experimental section show identical values for accuracy (Acc) and recall (Recall), which is unusual for classification tasks. In general they are not exactly the same and the author needs to give an explanation.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    In the abstract, the phrase ‘we propose an dental development staging network’ contains a grammatical error due to the incorrect use of the indefinite article. It should be revised to ‘a dental development staging network’.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has a strong contribution by providing a clinical labeled dataset, additionally, proposing a model that achieves accurate dental development staging by integrating local and global features and utilizing a dynamic expert selection mechanism. However, the following issues need to be further addressed and clarified by the authors:

    1. A more detailed explanation is needed regarding the advantages of DentalDS compared to existing datasets.
    2. The identical values of Acc and Recall in multiple tables require a reasonable explanation.
    3. The justification for using only seven tooth instances as a classification basis needs further substantiation.
    4. Minor grammatical errors should be corrected, such as changing “an dental” to “a dental” in the abstract.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    y concerns have been addressed, and I recommend accepting this paper.




Author Feedback

We sincerely thank the reviewers and ACs for their time and constructive feedback. We have addressed major concerns and clarified misunderstandings as follows:

[R1 Q1] Model Response to Tooth Variability While our dataset excludes cases with congenital absence of permanent teeth to ensure annotation consistency, the trained model retains generalization capability to clinically relevant variations. In cases where a target tooth is missing or unerupted, the model does not output a developmental stage for that tooth, without affecting predictions for others, supporting robustness in real-world pediatric scenarios. [R1 Q2] Comparison with SOTA Methods The three references (published in medical journals) focus on applying standard deep learning models (e.g., EfficientNet, generic CNNs, and DETR) for dental development staging without introducing methodological innovations. The baseline models were already evaluated in our work. [R1 Q3-Q4] Evaluation by Tooth Types and Interpretability We appreciate the reviewer’s insightful suggestions. Our evaluation reports the average performance across seven annotated tooth positions. Therefore, the ACC for each tooth type can be directly obtained: incisors (69.6, 72.3), canines (72.3), premolars (73.2, 78.6), and molars (83.9, 83.9). Due to rebuttal limits, additional metrics and interpretability will be provided in a revision. [R1 Q5] Annotation Consistency Each sample was independently annotated by one expert and one senior expert. Discrepancies were resolved via majority voting among three experts. Cohen’s kappa indicated substantial agreement. We will consider reporting agreement metrics on the dataset page to further quantify annotation reliability.

[R2 Q1] Advantages of DentalDS over Existing Datasets The advantages of our dataset over existing ones [27, 26, 4, 12, 30, 8] are summarized in Table 1. Additionally, Ong et al. present a relatively large but non-public dataset with stricter exclusion criteria (e.g., excluding cases with congenital tooth agenesis, orthodontic treatment, or apical lesions). In contrast, DentalDS includes such complex cases (e.g., prior orthodontic treatment, or dental lesions), better reflecting real-world clinical scenarios and enhancing clinical applicability for downstream tasks. [R2 Q2-1] Selection of Seven Permanent Teeth The selection of the seven left permanent mandibular teeth is based on Demirjian method, which is a clinical gold standard for dental development staging. These teeth offer sufficient information for downstream tasks (dental age estimation and orthodontic treatment planning), avoiding the complexity and redundancy of full-mouth annotation. Our choice aligns with established clinical practice. [R2 Q2-2] Assessment of Sampling Bias To minimize bias, sample selection was based on objective criteria. We ensured representative distributions across age and sex where feasible. [R2 Q3] Explanation of Acc and Recall Values Given class imbalance in tooth development stages, we use average=’weighted’ in sklearn.metrics to weight classes by support, providing a more accurate reflection of model performance on real-world data. The equality between accuracy and recall is mathematically valid under the standard average=’weighted’ definition (as implemented in https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html). Given: The recall for class i: TP_i / (TP_i + FN_i) Weight for class i:(TP_i + FN_i) / N Then the weighted average recall is: Weighted Recall = = ∑_i (Recall_i × Weight_i) = ∑_i [TP_i / (TP_i + FN_i) × (TP_i + FN_i) / N] = ∑_i (TP_i / N) = Acc Thus, under single-label multiclass settings with true-class weighting, Weighted Recall is equal to Acc. We appreciate the reviewer’s attention to this point and acknowledge that the equivalence can appear counterintuitive. We will clarify it in the revised version. [R2 Q4] We thank the reviewer for pointing out the error (“an dental” → “a dental”).




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    The paper presents a valuable contribution with a new public dataset for pediatric DDA and a novel model (DDSNet). Both reviewers see potential and lean towards acceptance. However, the rebuttal is needed to address: 1) Clear rationale for the “seven teeth” selection strategy and a discussion/plan on how the model handles or could handle tooth variability. 2) Explanation for the identical Accuracy and Recall values. 3) Comparison with the SOTA methods cited by R1, or a strong reason if not possible. 4) Inclusion of inter-rater agreement metrics for the dataset annotations.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top