Abstract

Accurate detection of bone fenestration and dehiscence (FD) is of utmost importance for effective treatment planning in dentistry. While cone-beam computed tomography (CBCT) is the gold standard for evaluating FD, it comes with limitations such as radiation exposure, limited accessibility, and higher cost compared to intraoral images. In intraoral images, dentists face challenges in the differential diagnosis of FD. This paper presents a novel and clinically significant application of FD detection solely from intraoral images, eliminating the need for CBCT. To achieve this, we propose FD-SOS, a novel open-set object detector for FD detection from intraoral images. FD-SOS has two novel components: conditional contrastive denoising (CCDN) and teeth-specific matching assignment (TMA). These modules enable FD-SOS to effectively leverage external dental semantics. Experimental results showed that our method outperformed existing detection methods and surpassed dental professionals by 35% recall under the same level of precision. Code is available at {https://github.com/xmed-lab/FD-SOS.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0234_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0234_supp.pdf

Link to the Code Repository

https://github.com/xmed-lab/FD-SOS

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Elb_FDSOS_MICCAI2024,
        author = { Elbatel, Marawan and Liu, Keyuan and Yang, Yanqi and Li, Xiaomeng},
        title = { { FD-SOS: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper considers the problem of detecting diseased anterior teeth from frontal pictures of the teeth. The motivation is that perhaps if it works, it can replace cone beam computed tomography which is the gold standard for the diagnosis. The paper explores several object detection/classification models and presents a slightly modified method which outperforms some baseline models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Curated a dataset with gold standard labels obtained from CBCT.
    2. Proposed model outperforms baseline models
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The FDTOOTH dataset is not publicly available – reduces the value and impact of the paper
    2. Limited novelty. As far as I understand it, “conditional contrastive denoising” simply ensures that anterior teeth which do not have a label are not considered negative samples in the contrastive learning framework. In my mind, this can hardly be called novel. Also equation 1 has a type: the second line should also be p/2 (otherwise the sum of probabilities will not be 1). The teeth masking step is also rather trivial.
    3. Missing baselines: Based on the authors’ supplementary information, detection of teeth is essentially a solved problem, with the open set detectors. Then I wonder how a very simple approach compares: first detect the individual teeth, and in a second stage, train a simple classifier to distinguish between FD vs no FD. This is the obvious approach which is missing. Also, there is no mention of DINOv2, which is a much stronger model.
    4. Missing details: how is p (equation 1) set? There is no mention of how the dentists evaluated the cases (how many raters, what about inter-rater reliability, experience levels of the raters, size and resolution of images, time taken).
    5. No statistical analysis to back up the claims of substantial improvements
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    It would have been great if the method were compared to simple and obvious baselines (see weaknesses for an exanple). Also, statistical analysis is very important, particularly when the test dataset is small, and variations in performance could results from chance.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The weaknesses in the paper: the missing baselines, small test sets, and the test set being not publicly available, etc outweigh the positives. But it is an interesting application and may hold value in practice.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    One of my comments was rooted in my misunderstanding of two DINOs.



Review #2

  • Please describe the contribution of the paper

    This paper introduces FD-SOS, a pioneering method for detecting bone fenestration and dehiscence (FD) solely from intraoral images, eliminating the need for cone-beam computed tomography (CBCT). The proposed FD-SOS framework incorporates two innovative components: conditional contrastive denoising (CCDN) and teeth-specific matching assignment (TMA), allowing it to effectively leverage dental semantics. Experimental results demonstrate that FD-SOS outperforms existing detection methods and even surpasses the performance of dental professionals by achieving a 35% improvement in recall while maintaining the same level of precision. This breakthrough not only addresses the limitations of CBCT, such as radiation exposure and cost, but also enhances the accuracy of FD detection, thereby significantly impacting treatment planning in dentistry.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper mainly strengthen the dataset limitation by adding bone fenestration and dehiscence based annotation.
    2. Secondly, it claims the vision language model using multimodalities.
    3. Code will be released after publication.

    If it is giving a language model based description which is currently not showing anywhere into the paper then it will be an interesting contribution.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This paper given a vision language model which is basically need big data to train and to provide confident results but it only uses few hundred images and labels which is main weakness of this study.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It should provide the dataset format of vision language model and also provide the minimum amount of data which needed to overfit and train the model.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. This paper given a good solution to dentistry domain but need clearance in paper representation such as there is no limitation is mentioned in abstract which is solved by the authors.
    2. It is mentioned that vision language model is used for solution but there is no explicit usage of it and the detected output is not mentioned. We can use simple object detection by including FD objects annotation then why necessarily the vision language model?
    3. some typos errors are there such as after figure 2, “Finally, we leverage a postional”. please check in all manuscript to revise it carefully
    4. What was the limitation in previous methods which are used in the paper such as DINO, GLIP etc which was covered by your proposed method?
    5. You have mentioned that teeth specific matching assignment method is assigned, is it not applicable to apply on other applications? if not then it should be generalized method
    6. Since vision language model and DETR models need high computation and memory resources, I would like you to give a low cost solution since you have very limited data for training of these giant models.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    For vision language model, the dataset collection and detection model is proposed whereas a very limited dataset is collected for training of vision language models and transformer models. I would like to get a light weight approach as well from the authors based on their limited data to give a more confident results and approach. Overl all score of the paper is average as there is no limitation mentioned in abstract which is covered in technical aspect for which the model is proposed.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    The provided answer addresses some points raised in the question but not all. It acknowledges the challenges faced with different types of detectors and highlights the benefits of using OSOD pre-trained on multimodal data. However, it does not explicitly address the need for a lightweight experiment based on the limited dataset size.



Review #3

  • Please describe the contribution of the paper

    The paper introduces FD-SOS, an approach for detecting bone fenestration and dehiscence (FD) solely from intraoral images, eliminating the need for cone-beam computed tomography (CBCT). The key contributions lie in the development of FD-SOS, which incorporates conditional contrastive denoising (CCDN) and teeth-specific matching assignment (TMA) modules to effectively leverage dental semantics. Experimental results demonstrate superior performance compared to existing methods, with a 35% increase in recall over dental professionals at the same precision level. This innovative approach has the potential to significantly impact clinical practice by offering a non-invasive and cost-effective alternative for FD detection in dentistry.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper introduces to use intraoral images for detecting bone fenestration and dehiscence (FD), a clinically significant issue in dentistry. This application addresses the limitations of existing methods, such as reliance on cone-beam computed tomography (CBCT) scans, by providing a non-invasive and cost-effective alternative.
    2. FD-SOS introduces an interested framework based on open-set object detection (OSOD), incorporating innovative components such as Conditional Contrastive Denoising (CCDN) and Teeth-Specific Matching Assignment (TMA). These components enhance the model’s ability to accurately detect FD from intraoral images by leveraging dental semantics and addressing challenges such as missing annotations.
    3. The paper demonstrates the clinical feasibility of FD-SOS by showing its comparable results compared to existing detection methods and even surpassing the performance of dental professionals. This highlights the potential of FD-SOS to be used in practical dental settings, contributing to improved treatment planning and patient care.
    4. The evaluation of FD-SOS includes enough experiments and comparisons with various baseline methods, including traditional object detectors, multi-label baselines, and open-set object detection models. The ablation study further validates the effectiveness of the proposed components, CCDN and TMA, in improving FD-SOS performance.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper mentions that the ground truth for FD detection is obtained through cone-beam computed tomography (CBCT) scans. While CBCT is considered the gold standard for evaluating FD, its use as the sole ground truth source may introduce biases or inaccuracies, especially if CBCT scans are not available for all cases or if there are discrepancies between CBCT findings and intraoral images.
    2. While the paper demonstrates promising results in terms of FD detection accuracy, there may be challenges in translating the proposed method into clinical practice. Factors such as integration with existing dental imaging systems, workflow compatibility with dental clinics, and acceptance by dental professionals need to be addressed for successful clinical implementation.
    3. While the paper compares the performance of FD-SOS with existing detection methods and dental professionals, it lacks a comparative analysis with established clinical protocols or guidelines for FD detection. Such an analysis would provide insights into the real-world clinical utility and effectiveness of FD-SOS in comparison to standard diagnostic approaches used by dental professionals.
    4. The paper acknowledges the challenge of limited dataset size, which poses difficulties in training robust models for FD detection. While the authors attempt to mitigate this limitation by leveraging publicly available dental datasets, the effectiveness of the approach may be constrained by the small size of the collected dataset (150 intraoral images). This could lead to potential overfitting and limited generalizability of the proposed method.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    While the submission does not explicitly mention open access to source code or data, it does provide a clear and detailed description of the algorithm, which is crucial for reproducibility. Therefore, the paper ensures that interested parties can understand the methodology and replicate the results based on the provided information.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The “FD-SOS” paper presents a clinical application in the field of dentistry. Detailed and constructive comments on various aspects of the paper are as follows:

     The title effectively communicates the focus of the paper, which is important for attracting readers in the dental and medical imaging communities.  The abstract provides a clear overview of the paper, highlighting the significance of accurate FD detection and summarizing the proposed method succinctly. Mentioning the key performance metrics achieved by the method in the abstract to attract more attention from readers.  The introduction effectively sets the stage by highlighting the importance of accurate FD detection in dentistry and the limitations of current diagnostic methods.  The paper provides a clear rationale for developing an FD detection approach using intraoral images, which addresses the accessibility and cost concerns associated with CBCT scans.  The paper provides more context on the prevalence and impact of FD in dentistry to further emphasize the clinical significance of the work.  It would be beneficial to elaborate on the specific challenges faced by dentists in accurately diagnosing FD from intraoral images, providing examples or case studies if possible.  The authors should link their contributions with the weakness of the state of the art to motivate their strength compared to the existing methods.  The description of datasets is comprehensive, and the comparison between the dataset used in the paper (FDTOOTH) and existing public datasets adds value to the work.  The paper’s use of a vision-language framework for FD detection is innovative. However, consider providing more details on why this approach was chosen over traditional image-based methods and deep learning models and how it leverages both visual and textual information effectively.  The overview of the overall framework is clear, but it would be helpful to provide a visual representation or flowchart to enhance understanding of the methodology.  Detailed explanations of the proposed components (CCDN and TMA) are required to help understand the novelty of the approach. Consider providing examples or illustrations to further clarify the implementation of these components.  It’s important to address potential limitations or challenges associated with the proposed methodology, such as computational complexity or potential biases introduced by the dataset or algorithm design.  The implementation details are well-described, including the choice of backbone models and optimization techniques. This provides transparency and facilitates reproducibility. Also, I would like it if the authors were available to visualize some layers of the proposed model to interpret it.  The comparison with various baseline methods, including traditional object detectors, is good for the evaluation. However, consider providing more context on why these specific baselines were chosen and how they are related to existing methods in the literature.  The results tables provide a complete overview of the performance of the proposed method compared to baselines, which is essential for evaluating the efficacy of the approach. However, for instance, I would like if the authors could show the complexity of the compared models in terms of the number of trained parameters or FLOPs.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The paper addresses a significant clinical issue in dentistry by introducing FD-SOS, an innovative open-set object detector for bone fenestration and dehiscence detection from intraoral images. However, I have concerns regarding the method’s robustness, particularly about posterior teeth, as they are affected by many factors, which are considered one of the main challenges of this problem.
    2. Experimental results demonstrate the superiority of the proposed method over existing methods. Improvements could be made, such as providing further details on the choice of specific methodologies and their advantages over alternative approaches. Additionally, acknowledging any potential limitations or challenges in the proposed approach and discussing factors that could affect the generalizability or applicability of the findings in real-world clinical settings is essential.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Dear Area Chairs and Reviewers, We would like to thank you all for your positive feedback, and the opportunity to clarify the points raised about our vision-language open-set detector for bone fenestration and dehiscence, FD-SOS, solving “a clinically significant issue in dentistry” (R3). FD-SOS is a “pioneering method” (R1), “an interested framework”, “incorporating innovative components” “ with potential to be used in practical dental settings” (R3). The “proposed model outperforms baseline models” (R4) and the “evaluation of FD-SOS includes enough experiments and comparisons” (R3). Yet there are reservations: i) Limited dataset size, biases, and statistical significance (R1, R3, R4): Considering the dataset size, unlike image-based classification, our dataset contains dense bounding boxes for 1,800 teeth. Teeth-level annotations are costly since dentists need to adjust each CBCT scan to diagnose each individual tooth. In the dataset, the inter-rater agreement over 40 CBCT volumes (480 teeth) is reliable (κ > 0.9) for the gold standard. The held-out test set, used to evaluate the 14 models listed in Table-2, as well as dentists, includes 480 annotated objects (e.g., teeth), providing reliable detection metrics that are statistically significant when compared to the baseline (p<0.0004), with attention-maps on gingiva. ii) Dataset availability, and missing details (R4): The dataset with details will be made public along the code. Two dentists annotated the data, with disagreements resolved through consensus. The performance comparison with dentists involves different human evaluators (All with over 5Y of clinical experience in orthodontics). iii) Use of obvious baselines & technical contribution (R4): We address R4’s potential misunderstanding about DINO (Caron et al.) & DINOv2 (Oquab et al.) with DINO (Zhang et al.) & G-DINO (Liu et al.): ‘Di’stillation ‘no’ labels (DINOv1&2) are self-supervised models for classification and are not obvious baselines. DETR with Improved Denoising (DINO), and its SOTA, G-DINO, are baselines with eight other SOTAs (Table 2). We agree both are confusingly “DINO”, but they differ significantly, with DINOv2 proposing self-supervised (SSL) models. Object detection is standardly initialized with supervised object detection, as it’s more competitive than SSL. We initially tried simple teeth-level classifier approaches (e.g, on top of DINOv2). Yet, OSOD offered a more competitive baseline. Though simple yet novel, Conditional contrastive denoising (CCDN) effectively avoids VLM catastrophic forgetting by leveraging the separability of the shared dental semantics. The “conditional” probability does not necessarily sum to 1, especially where conditions represent a subset of events; The first two lines should sum to 1 as it encompasses the entire label space, 𝑦. Yet, the third line considers only the anterior teeth, half of the overall label space of 𝑦. We follow G-DINO with p=0.5. iv) Why OSOD? (R1, R3): We experienced lightweight object detectors, sparse detectors, and multi-label baselines, all exhibited limited performance, even warmed-up, mainly due to the task complexity (Table 2). Conversely, OSOD pre-trained on vast amounts of multimodal data performed better even without warmup, due to their inherent generalizable pre-trained knowledge. As R1&R3 highlighted, these OSODs suffer catastrophic forgetting. To solve this, we integrated publicly available datasets but the shared dental semantics restricted optimization. Thus, we propose conditional contrastive denoising (CCDN), which allows the OSOD to benefit from the shared separability of the dental semantics between the datasets, significantly boosting performance. FD-SOS shares the same complexity as the baseline G-DINO (64.02M), higher than DINO (48.04M), and significantly lower than GLIP (122.8M). We will follow your advice to expand the table with complexity. We hope our clarifications will help readjust the ratings to accept FD-SOS.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Satisfactory rebuttal.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Satisfactory rebuttal.



back to top