Abstract

Automated cephalometric landmark detection is crucial in real-world orthodontic diagnosis. Current studies mainly focus on only adult subjects, neglecting the clinically crucial scenario presented by adolescents whose landmarks often exhibit significantly different appearances compared to adults. Hence, an open question arises about how to develop a unified and effective detection algorithm across various age groups, including adolescents and adults. In this paper, we propose CeLDA, the first work for \textbf{Ce}phalometric \textbf{L}andmark \textbf{D}etection across \textbf{A}ges. Our method leverages a prototypical network for landmark detection by comparing image features with landmark prototypes. To tackle the appearance discrepancy of landmarks between age groups, we design new strategies for CeLDA to improve prototype alignment and obtain a holistic estimation of landmark prototypes from a large set of training images. Moreover, a novel prototype relation mining paradigm is introduced to exploit the anatomical relations between the landmark prototypes. Extensive experiments validate the superiority of CeLDA in detecting cephalometric landmarks on both adult and adolescent subjects. To our knowledge, this is the first effort toward developing a unified solution and dataset for cephalometric landmark detection across age groups. Our code and dataset will be made public on https://github.com/ShanghaiTech-IMPACT/CeLDA.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0737_paper.pdf

SharedIt Link: https://rdcu.be/dV17e

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72086-4_15

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0737_supp.pdf

Link to the Code Repository

https://github.com/ShanghaiTech-IMPACT/CeLDA

Link to the Dataset(s)

https://github.com/ShanghaiTech-IMPACT/CeLDA

BibTex

@InProceedings{Wu_Cephalometric_MICCAI2024,
        author = { Wu, Han and Wang, Chong and Mei, Lanzhuju and Yang, Tong and Zhu, Min and Shen, Dinggang and Cui, Zhiming},
        title = { { Cephalometric Landmark Detection across Ages with Prototypical Network } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {155 -- 165}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a comprehensive landmark detection method for different age groups, using prototype-based projection for the first time in this subfield. Secondly, this paper introduces a new paradigm for prototype relationship mining using a masking modeling approach. Finally, this paper proposes a new comprehensive benchmark dataset for landmark detection that consists of cephalometric images from both adolescent and adult subjects.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The motivation of this paper is clearly articulated and meaningful, starting from the characteristics of adolescents and attempting to extend cephalometric detection technology to different age groups, which also holds high value in practical applications.

    The technical process is simple, and I believe it has strong reproducibility.

    The experimental performance is good, demonstrating promising application potential.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1 The methods used in the plan are very outdated. Both prototype technology and masking modeling have a very long research history. Therefore, the technical innovation in this paper is not strong.

    2 Regarding the EMA algorithm, it is known for its high sensitivity to new data. In this paper, the alpha value is set at 0.99, which means that prototypes derived from different samples are weighted differently based on their sequence. I believe such a design is unreasonable and counterintuitive.

    3 The application of masking techniques seems somewhat forced in the motivation presented. I am skeptical about whether it truly utilizes the relationships between prototypes, and I hope the authors can provide more analysis.

    4 Many hyperparameters in the paper are directly assigned specific values; it would be beneficial to provide ablation studies for a more in-depth analysis.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    no

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See weaknesses.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The motivation of the paper is good, and the methods are simple. However, the technology is somewhat outdated, and some of the design choices seem forced. If the authors can provide satisfactory explanations, I would consider raising the score.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The author provided a clear answer to my question. The proposed method is simple and effective, and the author’s rebuttal is with good organization.



Review #2

  • Please describe the contribution of the paper

    This work proposed a cephalometric landmark detection model that addresses different age groups using the prototypical network. They collected their own data, which consists of cases involving both adolescents and adults.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method exhibits better performance than baselines. The collected dataset will be made publicly available.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Overall, the proposed method shows limited performance improvement with small margin.
    • Lack of qualitative comparisons. Only two samples were used for qualitative assessment.
    • The comparison was made with only one dataset, and there was no comparison with a public dataset, limiting the effectiveness of the proposed method.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Notably, the performance improvement by L_mine appears weak, with an MRE difference from the ablated version less than the standard deviation. A significance test is required.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While there is a marginal performance improvement and a lack of novelty, the release of the public dataset is beneficial.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    There are still unresolved issues, such as insufficient reasons for not using other datasets for validation, demonstrating limited effectiveness, and the marginal improvement by L_mine in Table 2 (less than standard deviation). However, the decision to release the dataset is a significant contribution. I will maintain the current score.



Review #3

  • Please describe the contribution of the paper

    This study proposes Cephalometric Landmark Detection across Ages (CeLDA) method to improve prototype alignment and obtain a holistic estimation of landmark prototypes in cephalograms acquired from adults and adolescents. They have three contributions as follows: 1) the first prototype-based approach for age-inclusive cephalometric landmark detection, where the holistic prototypes are obtained to improve the learning robustness and predictive performance on cephalograms from adults and adolescents; 2) the method is trained and evaluated using a private dataset of cephalograms acquired from adults and adolescents.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The paper is generally well written and structured, also easy to read and understand about the purpose and challenges (e.g., significant shifts of landmark by unerupted and baby teeth) of cephalometric landmark detection in cephalograms acquired from adults and adolescents. 2) The holistic estimation and relation mining of CeLDA are interesting and reasonable for learning the landmark-representative features between age groups. 3) Compared with other methods, the CeLDA shows superior detection performance for ten landmarks in both adult and adolescent cases, only adult cases and only adolescent cases.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The details of the data and annotations of CephAdoAdu is unclear. Following the ISBI 2015 challenge [1], they provided 19 cephalometric landmarks for the classification of anatomical types by measuring eight clinical standards. Why do authors annotate only ten cephalometric landmarks? [1] Wang, Ching-Wei, et al. “A benchmark for comparison of dental radiography analysis algorithms.” Medical image analysis 31 (2016): 63-76.

    2) What kind of cephalometric landmarks do you annotate? According to these paper [2,3], 80 landmarks can be annotated for AI-based landmark detection in a cephalogram, and they built a dataset including over 1400 cephalograms. [2] Park, Ji-Hoon, et al. “Automated identification of cephalometric landmarks: Part 1—Comparisons between the latest deep-learning methods YOLOV3 and SSD.” The Angle Orthodontist 89.6 (2019): 903-909. [3] Hwang, Hye-Won, et al. “Automated identification of cephalometric landmarks: Part 2-Might it be better than human?.” The Angle Orthodontist 90.1 (2020): 69-76.

    3) How can you guarantee the feasibility of CeLDA in real-world clinical practice by only using ten landmarks? Also, many papers [4,5,6,7] provided the results of clinical measurements for the classification of anatomical types (class I, II, and III) from automatically detected cephalometric landmarks. Please provide the results of the classification of anatomical types. [4] Yang, Su, et al. “Ceph-Net: automatic detection of cephalometric landmarks on scanned lateral cephalograms from children and adolescents using an attention-based stacked regression network.” BMC Oral Health 23.1 (2023): 803. [5] Juneja, Mamta, et al. “A review on cephalometric landmark detection techniques.” Biomedical Signal Processing and Control 66 (2021): 102486. [6] Schwendicke, Falk, et al. “Deep learning for cephalometric landmark detection: systematic review and meta-analysis.” Clinical oral investigations 25.7 (2021): 4299-4309. [7] Oh, Kanghan, Il-Seok Oh, and Dae-Woo Lee. “Deep anatomical context feature learning for cephalometric landmark detection.” IEEE Journal of Biomedical and Health Informatics 25.3 (2020): 806-817.

    4) The details of the mean age of each group are unclear. In adolescents, all their baby teeth are usually gone between 12-13. Therefore, tooth conditions of adolescents over 13 years could be similar to those of adults.

    5) How many malocclusion patients do you collect? Are there patients wearing braces (metallic materials) or undergoing orthodontic treatment? Please provide details of the CephAdoAdu dataset.

    6) In the Masked Prototype Relation Mining, it is unclear which instance prototype p_m or p_n or p^hol is used. If authors used p_m and p_n for this process, the total loss consists of L_reg + L_align + L_mine,n, + L_min,m. If authors used p^hol, the input of Eq. (7) can be revised to hat(p)^hol_k.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The methodological details of this paper are sufficient to reproduce this work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) The Reviewer agrees with the challenges posed by significant landmark shifts caused by unerupted and baby teeth in the cephalometric landmark detection task. According to the results of Ceph-Net [1], however, image contrast or quality more affected the performance of cephalometric landmark detection than the condition of teeth. The CephAdoAdu is collected from eight clinical centers. It can result in a difference in the image contrast or quality in the dataset. [1] Yang, Su, et al. “Ceph-Net: automatic detection of cephalometric landmarks on scanned lateral cephalograms from children and adolescents using an attention-based stacked regression network.” BMC Oral Health 23.1 (2023): 803.

    2) Please provide the number of adolescents having unerupted and baby teeth like Ceph-Net [1]. [1] Yang, Su, et al. “Ceph-Net: automatic detection of cephalometric landmarks on scanned lateral cephalograms from children and adolescents using an attention-based stacked regression network.” BMC Oral Health 23.1 (2023): 803.

    3) As mentioned above, many papers provided the results of clinical measurements for the classification of anatomical types (class I, II, and III) from automatically detected cephalometric landmarks. Please provide the results of the classification of anatomical types to verify the clinical feasibility of CeLDA.

    4) Learning spatial offsets of cephalometric landmarks is a good choice to improve the detection performance [2]. [2] Yao, Qingsong, et al. “Miss the point: targeted adversarial attack on multiple landmark detection.” Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part IV 23. Springer International Publishing, 2020.

    5) Learning graph structures of landmarks can enhance the representations of both local image features and global shape features [3]. [3] Li, Weijian, et al. “Structured landmark detection via topology-adapting deep graph learning.” Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IX 16. Springer International Publishing, 2020.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the details of their dataset (CephAdoAdu) and the clinical feasibility are unclear, this paper provides a novel approach to improving the performance of landmark detection.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thanks for the response. This makes more sense to me now. The release of the CeLDA dataset will benefit other researchers.




Author Feedback

We thank the reviewers for their detailed feedback. We are glad all reviewers(R1,R3,R4) appreciated the importance and clinical value of the problem we introduced. Reviewers also appreciated the technical motivation(R1,R4), superior performance(R1,R3,R4) and the new benchmark dataset (R3, R4) of our work. We summarize the concerns as follows:

Q1(R1)Technical innovation A:Our method is simple but effective. The technical innovation naturally aligns with our motivation to address cross-age detection. The prototypical network can faithfully capture anatomical property of each landmark, and the masked modeling mechanism can further enhance the correlation between different landmarks. All these techniques are first developed in the field of cephalometric landmark detection which achieved superior performance against the current SOTA.

Q2(R1)Alpha setting in EMA A:As in Eq.(4) in the main paper, alpha is the weight for the historical value of EMA, and the current prototype data is weighted by (1-alpha). The alpha is set to 0.99, our holistic prototypes can be robust to outliers of current prototype data, because it only holds a weight of 0.01. We tested a series of alpha values, finding that our method is fairly robust to alpha larger than 0.8.

Q3(R1,R4)Details about masked mining A:Masked mining is used on all instance prototypes. We computed the cosine similarity between all landmark prototypes and found that intrinsically related landmarks, like A and UIA, exhibit higher similarity compared to those without masked mining. This indicates that our mining strategy enhances the correlation between landmarks. In addition, our ablation results in Table 2 demonstrate that the mining strategy significantly improves detection accuracy.

Q4(R1)Hyperparameter analysis A:We have provided ablation analysis of many important hyperparameters in the initial submission. The analysis of mask ratio R, lamda_1 and lamda_2 has been provided in Fig.3(b) of the main paper, Fig.1 and Fig. 2 in the supplementary materials respectively.

Q5(R3)Evaluated only on one dataset with limited improvement A:To date, our collected CephAdoAdu dataset is the first and only one that includes both cases of adolescents and adults, allowing algorithm development and evaluation of cephalometric landmark detection across ages. From Table 1, our CeLDA achieves the best MRE result of 1.05 mm on all cases, where the improvement is 21.64% over previous SOTA method with MRE of 1.34 mm. Additionally, a one-sided paired T-test yielded a p-value below 0.05, showing significant superiority.

Q6(R4)Criteria for annotation of CephAdoAdu dataset A:In clinical practice, the templates used by different regions and hospitals vary significantly in the number and definition of landmarks(e.g. Vienna template, Huaxi template). For CephAdoAdu annotation, we consulted experienced dentists and selected landmarks based on two criteria: 1) they are defined in all templates, and 2) they are challenging to detect across ages. Ultimately, we selected 10 target landmarks. We will detail this information in our paper, and further enrich our CephAdoAdu dataset by including more landmarks in the future.

Q7(R4)More descriptions of the dataset A:Adolescents are defined by the presence of baby teeth, regardless of age, and adults by the absence of baby teeth. The percentage of malocclusion patients, patients wearing braces, patients undergoing orthodontic treatment are about 70% 10% and 20% respectively. We will provide this information in released dataset.

Q8(R4)Feasibility in clinical practice A:CeLDA is evaluated on 10 challenging landmarks, showing its superiority over current SOTA in cross-age landmark detection. It can generalize to any number of landmarks by adjusting the number of prototypes during training. This precise localization benefits further clinical evaluations, e.g. angle calculation and anatomical classification.

Code and dataset will be made public.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The proposed method further extends the landmark detection for different age groups and the promised data release will have a great value for the community.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The proposed method further extends the landmark detection for different age groups and the promised data release will have a great value for the community.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper received a final rating of 3 x Weak Accepts. The main factor for accepting the paper is the promise by the authors that they will release the data publicly. I encourage the authors to release the data as soon as possible.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper received a final rating of 3 x Weak Accepts. The main factor for accepting the paper is the promise by the authors that they will release the data publicly. I encourage the authors to release the data as soon as possible.



back to top