Abstract

In the realm of orthognathic surgical planning, the precision of mandibular deformity diagnosis is paramount to ensure favorable treatment outcomes. Traditional methods, reliant on the meticulous identification of bony landmarks via radiographic imaging techniques such as cone beam computed tomography (CBCT), are both resource-intensive and costly. In this paper, we present a novel way to diagnose mandibular deformities in which we harness facial landmarks detectable by off-the-shelf generic models, thus eliminating the necessity for bony landmark identification. We propose the Diagnosis-Reconstruction Transformer (DiRecT), an advanced network that exploits the automatically detected 3D facial landmarks to assess mandibular deformities. DiRecT’s training is augmented with an auxiliary task of landmark reconstruction and is further enhanced by a teacher-student semi-supervised learning framework, enabling effective utilization of both labeled and unlabeled data to learn discriminative representations. Our study encompassed a comprehensive set of experiments utilizing an in-house clinical dataset of 101 subjects, alongside a public non-medical dataset of 1,519 subjects. The experimental results illustrate that our method markedly streamlines the mandibular deformity diagnostic workflow and exhibits promising diagnostic performance when compared with the baseline methods, which demonstrates DiRecT’s potential as an alternative to conventional diagnostic protocols in the field of orthognathic surgery. Source code is publicly available at https://github.com/RPIDIAL/DiRecT.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1074_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/RPIDIAL/DiRecT

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Xu_DiRecT_MICCAI2024,
        author = { Xu, Xuanang and Lee, Jungwook and Lampen, Nathan and Kim, Daeseung and Kuang, Tianshu and Deng, Hannah H. and Liebschner, Michael A. K. and Gateno, Jaime and Yan, Pingkun},
        title = { { DiRecT: Diagnosis and Reconstruction Transformer for Mandibular Deformity Assessment } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The primary contribution of this paper lies in introducing a novel method for mandibular deformity diagnosis. This method leverages a pre-trained generic model to detect 3D facial soft tissue landmarks autonomously, without relying on bone landmarks. The proposed diagnoser network demonstrates promising results and mitigates the reliance on labeled medical datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    N/A

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses of this paper are as follows: 1) The reliance on facial landmarks for mandibular deformity diagnosis may introduce inaccuracies due to the variability of facial soft tissue landmarks influenced by factors such as facial expressions, age, and weight fluctuations, while bone structure remains relatively stable. Additionally, errors in the back-projection process from 2D to 3D could lead to the loss of facial structural information. 2) The utilization of MSE constraints for reconstruction loss may not adequately capture the complexity and diversity of facial geometry information. 3) The insufficient quantity of sample data utilized in the model training phase, in comparison to other models within the medical domain, raises concerns regarding potential overfitting to a limited dataset.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It is recommended that the author offers a more comprehensive description of the network structure, including details such as the specific configuration of the Embedding layer, the dimensions of the Linear layer weight matrix. Furthermore, authors are encouraged to provide access to code or links to pre-trained models to facilitate easy replication by other researchers.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) It is recommended that bone landmarks or a combination of bone and facial landmarks be utilized as foundational elements for mandibular deformity diagnosis, thus facilitating more effective intraoperative planning. 2) To effectively capture relevant facial geometry features, it is advisable to explore loss functions specifically tailored to facial geometry information, such as Chamfer distance or Hausdorff distance. 3) Further optimization of the cosine similarity loss is advised, along with the incorporation of additional supervisory information or feature alignment strategies. This will bolster the consistency of feature learning across the teacher-student network. 4) Expanding the size of the training dataset is encouraged to ensure that the model’s performance is accurately reflected. Additionally, it is important to account for variations in data distribution among different types and severities of deformities during model training. This approach will enable the model to comprehensively capture distinctive characteristics of each deformity type, facilitating precise clinical diagnosis.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend against accepting this paper for publication. Firstly, relying on facial soft tissue landmarks for mandibular deformity diagnosis lacks reliability compared to utilizing stable bone structures. Additionally, the back-projection process may lead to the loss of substantial facial information, thereby compromising the accuracy of facial landmarks. Secondly, training the model on a small dataset increases the risk of overfitting, necessitating further validation of its performance.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The relationship between facial surface (including facial landmarks) and facial bone is actually unpredictable, as the facial soft tissue would change along with varying age/weight/expression while the bone remains unchanged. Currently, facial deformity diagnose relying on bony landmarks is a standard in clinical practices, replacing bony landmarks with facial landmarks is interesting but risky.



Review #2

  • Please describe the contribution of the paper

    This paper proposed a Diagnosis-Reconstruction Transformer (DiRecTr) network, which can automatically detect 3D facial landmarks to assess mandibular deformities.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper proposed a method that simplifies the diagnosis of mandibular deformities by using the facial soft tissue landmarks rather than the bony anatomical landmarks.

    2. This paper proposed an innovative DiRecTr network to address the task of mandibular deformity diagnosis using automatically detected facial landmarks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In Table 1,regarding the accuracy of prognathic, the proposed method performs slightly worse than the previously reported methods. This suggests that the proposed method may face challenges in prognathic.

    2. The evaluation index in this paper only uses the accuracy, and it is difficult to comprehensively evaluate the superiority of the proposed method.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It is recommended that the used datasets and code be open-sourced to improve the reproducibility of the proposed method.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. In Table 1,regarding the accuracy of prognathic, the proposed method performs slightly worse than the previously reported methods. This suggests that the proposed method may face challenges in prognathic.
    2. It is recommended that this paper add other datasets to verify the superiority of the proposed method in prognathic’s accuracy.
    3. The evaluation index in this paper only uses the accuracy, and it is difficult to comprehensively evaluate the superiority of the proposed method.
    4. The paper lacks the display of some result graphs, making it difficult to visually compare the results
    5. How did the gold standard for datasets come about?
    6. It is recommended that the used datasets and code be open-sourced to improve the reproducibility of the proposed method.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    For prognathic’s accuracy, the proposed method is slightly worse than the previously reported methods, and the evaluation indicators are not comprehensive.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Although the evaluation index of this article is relatively simple, the proposed detection method based on facial soft-tissue landmarks is indeed innovative compared to capturing landmarks from bone tissue.



Review #3

  • Please describe the contribution of the paper

    The study presents the use of the off-the-shelf generic facial landmark detection model for mandibular deformation assessment required for orthognathic surgery. Network training is complemented by landmark reconstruction, which allows the use of both labeled and unlabeled data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The large clinical dataset of 101 subjects and the public dataset of 1519 subjects are used to validate the method. The method is an alternative to resource-intensive and costly conventional diagnostic protocols, significantly streamlining the diagnostic process.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is not clear how the detection of facial landmarks instead of bony landmarks affects the accuracy of predicting mandibular deformity, as there is no direct correspondence between facial and bony landmarks and shapes.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The only original contribution is the transformation of the 3D problem into a 2D problem and the augmentation of the clinical dataset with a large amount of publicly available non-clinical data. Since the performance of the proposed method is not significantly higher than the performance of alternative methods, the benefit of omitting the imaging component is questionable, considering that the direct correspondence between facial and bony shape is not addressed, which is crucial for orthognathic surgery.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Originality is moderate. Significant factor influencing success for clinical translation namely the direct correspondence between facial and bony shape is not addressed.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We appreciate the reviewers’ recognition of our work as “innovative” and emphasizing it is “significantly streamlining the diagnostic process”. We address the reviewers’ concerns below:

  • Advantages of Using Facial Soft-Tissue Landmarks [R1, R5] We appreciate the reviewer’s recognition of our novel approach using facial soft-tissue landmarks for facial deformity diagnosis. We would like to emphasize the following advantages:
    1. No Radiation Risk: The detection of facial soft-tissue landmarks from 3D camera images eliminates the need for cone beam CT (CBCT) scans, reducing radiation exposure risks and costs.
    2. Pre-trained Models: The use of off-the-shelf models like Google MediaPipe, pre-trained on large public datasets, removes the need for extensive labeled data and annotations specific to our application.
    3. Comprehensive Representation: Facial soft-tissue landmarks are denser and more uniformly distributed, providing a comprehensive representation of the face’s overall geometry, which is beneficial for assessing deformities.
  • Adequacy of Using MSE Constraint for Facial Reconstruction Loss [R5] The MSE loss is computed over a dense set of 328 facial soft-tissue landmarks, which are sufficient to capture the complexity and diversity of facial geometry. This dense coverage ensures that the MSE loss adequately captures facial structural variations.

  • Risk of Overfitting [R5] To mitigate the risk of overfitting, we employed several strategies:
    1. Large Public Dataset: We utilized a large public dataset of 1,519 subjects w/o facial deformity labels in a semi-supervised learning framework, significantly reducing the risk of overfitting.
    2. Cross-Validation: A 4-fold cross-validation strategy was implemented to ensure a robust evaluation of the model’s performance, further mitigating overfitting concerns.
  • Performance on Prognathic Subjects [R3] We acknowledge that the performance on prognathic subjects is slightly lower compared to methods using bony landmarks. However, the extraction of bony landmarks requires CBCT imaging, which poses radiation risks and is costly. Our method, leveraging facial soft-tissue landmarks from 3D facial images that can be captured by non-invasive 3D cameras, is safer and more cost-effective. Furthermore, our approach adapts an off-the-shelf 2D facial landmark detection model to a 3D scenario without needing CBCT images or extensive landmark annotations for training, significantly reducing the burden of data collection and annotation.

  • More Metrics and Datasets for Further Evaluation [R3, R5] In the final version of the paper, we will include additional metrics such as F1 score, precision, and recall to provide a more comprehensive evaluation. We also plan to conduct further experiments on additional datasets to validate our method’s performance in future works.

  • Gold Standard of Deformity Diagnosis [R3] A senior oral and maxillofacial surgeon with over 30 years of clinical experience classified the subjects’ anteroposterior mandibular positions as normal, retrognathic, or prognathic. These classifications were used as the ground truth (gold standard) for our experiments.

  • Open-Source Code and Data [R3, R5] We commit to making the code publicly available upon acceptance of the paper. However, due to data privacy concerns regarding patient facial structures, the dataset cannot be publicly released.

  • Other Comments [R1, R5]
    1. Bone Landmarks: We agree that incorporating bone landmarks could enhance diagnosis accuracy. Future work will explore the combination of bone and facial landmarks.
    2. Advanced Loss Functions: We will investigate loss functions such as Chamfer distance or Hausdorff distance to better capture facial geometry in future works.
    3. Cosine Similarity Loss: We will further optimize the cosine similarity loss and incorporate additional supervisory information to enhance feature consistency across the teacher-student network.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top