Abstract

The large pretrained vision-language models (VLMs) have demonstrated remarkable data efficiency when transferred to the medical domain. However, the successful transfer hinges on the development of effective prompting strategies. Despite progress in this area, the application of VLMs to dentistry, a field characterized by complex, multi-level dental abnormalities and subtle features associated with minor dental issues, remains uncharted territory. To address this, we propose a novel approach for detecting dental abnormalities by prompting VLMs, leveraging the symmetrical structure of the oral cavity and guided by the dental notation system. Our framework consists of two main components: dental notation-aware tooth identification and multi-level dental abnormality detection. Initially, we prompt VLMs with tooth notations for enumerating each tooth to aid subsequent detection. We then initiate a multi-level detection of dental abnormalities with quadrant and tooth codes, prompting global abnormalities across the entire image and local abnormalities on the matched teeth. Our method harmonizes subtle features with global information for local-level abnormality detection. Extensive experiments on the re-annotated DETNEX dataset demonstrate that our proposed framework significantly improves performance by at least 4.3% mAP and 10.8% AP50 compared to state-of-the-art methods. Code and annotations will be released on https://github.com/CDchenlin/DentalVLM.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2968_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2968_supp.pdf

Link to the Code Repository

https://github.com/CDchenlin/DentalVLM

Link to the Dataset(s)

https://github.com/ibrahimethemhamamci/HierarchicalDet

BibTex

@InProceedings{Du_Prompting_MICCAI2024,
        author = { Du, Chenlin and Chen, Xiaoxuan and Wang, Jingyi and Wang, Junjie and Li, Zhongsen and Zhang, Zongjiu and Lao, Qicheng},
        title = { { Prompting Vision-Language Models for Dental Notation Aware Abnormality Detection } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a well-designed method for detecting dental abnormalities utilizing VLM, guided by text prompts related to dental abnormalities, with two significant contributions: 1. Harnessing the symmetry of quadrants within the dental structure to enhance the robustness and accuracy of abnormality detection; 2) Extracting images features not only from tooth-level local regions but also from the panoramic image at a global level.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The two main components of the approach, namely dental notation-aware tooth identification and multi-level dental abnormality detection, are well-designed to leverage dental priori knowledge. The incorporation of domain knowledge renders the algorithm suitable for automated dental applications. The experiments in the paper comprehensive, encompassing comparison with state-of-the-art approaches as well as three main ablation studies.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper’s first weakness lies in the necessity of using VLM, in particular, the language components. While the text prompts serve as classification labels, they fail to capitalize on the language interpretation capability of text features and the language-guided query selection from the original grounding DINO paper. This raises the question of whether image-only detection methods like Faster-RNN, YOLO, or DETR could yield similar or even better results with dental notation-aware tooth identification and multi-level abnormality detection. The second weakness of the paper is the insufficient data for model training and evaluation. With only 645 images in total and 127 images for testing, the limited data inevitably leads to result fluctuations. However, the paper does not provide data ranges such as confidence intervals and standard deviation

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    see the second weakness in Section 6 of main weaknesses.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors may want to consider offering explanations and evidences to justify their preference for VLM over traditional computer vision models Furthermore, employing n-fold cross-validation with such a small dataset could enhance reproducibility of the results Although the paper compares its approach a few state-of-the-art approaches, it is not clear to readers whether these models were evaluated with or without fine tuning using the same data split as the approach in the paper.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The recommendation primarily stems from the factors outlined in the main weakness section. The novelty of the paper is limited only to dental applications, not applicable to other domains. Meanwhile, the relatively small data set used for training and evaluation limits its value as an application paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors addressed the following questions adequately.

    1. why VLMs over pure vision models: the authors provided the results of vision model only, and justified the use of VLM; however, a brief description on the experiment setup should be provided in the final version.
    2. Fine-tuning of other approaches: convincing feedback
    3. Limited application beyond the dental area: convincing feedback



Review #2

  • Please describe the contribution of the paper

    This paper proposes a dental notation-aware abnormality detection framework by leveraging the dental notation system and incorporating multilevel abnormality prompting. The effectiveness of the proposed framework is validated on the re-annotated DENTEX dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The authors propose the use of dental notation system to label teeth and enhance semantic information and symmetrical relationships. The idea is novel and experiments have demonstrated its effectiveness.
    2. The authors considered different dental anomalies to select global and local features for detection, an idea that is consistent with clinical practice.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. I have some concerns about the task setting of this paper. In the proposed method, we need to determine the set of possible candidate anomalies as prompt in advance before detection, which limits its application scope.
    2. In this paper, a two-stage detection framework is proposed, which enumerates each tooth first and then performs subsequent dental abnormality detection. So how much influence does the tooth recognition error in the first stage have on the subsequent detection? The authors need to discuss it in detail.
    3. In this paper, experiments are conducted on only one dataset. I suggest the authors to validate the effectiveness of the proposed method on multiple datasets.
    4. The performance of the proposed method is excellent in most dental abnormality, but it is weaker than G-DINO in the detection of impacted tooth, and the author needs to analyze the reasons.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    If possible, it is hoped that the authors will open source the code to promote the community.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Please refer to “the main weaknesses of the paper” part.
    2. The authors did not sort out the related work enough. In the comparison experiments, most of them are general methods for natural scenes, and lack of comparison with special models for dental anomaly detection.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In this paper, authors present a dental notation-aware abnormality detection framework by leveraging the dental notation system and incorporating multilevel abnormality prompting. Overall, the method is novel and the results are effective. Therefore, I am more inclined to accept if my concerns can be effectively addressed.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposed a framework using VLM to extract features from tooth X-ray images, combined with feature represenatations of the medical notes, by replacing the one-hot vector with quandrant notation system where tooth sysmmetry information was utilized, automatic detection of abnormalities from the tooth X-ray images were implemented, and won the first-place at MICCAI Challenge 2023.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Strengths of the paper mainly resides on the writing and methodology. Writing and illustration were organized in a clear manner and made easy for readers to understand. Methodology has aggregated information from images on a global and local scale, and from the medical notes with diagnosis traits.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Weaknesses of the paper has yet to be found.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This paper proposed a framework using VLM to extract features from tooth X-ray images, combined with feature represenatations of the medical notes, by replacing the one-hot vector with quandrant notation system where tooth sysmmetry information was utilized, automatic detection of abnormalities from the tooth X-ray images were implemented, and won the first-place at MICCAI challenge 2023.

    Strengths of the paper mainly resides on the writing and methodology. Writing and illustration were organized in a clear manner and made easy for readers to understand. Methodology has aggregated information from images on a global and local scale, and from the medical notes with diagnosis traits.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A recommendation of accept is due to the novelty in methodology, in how information of images and text get to be aggregated through the VLM, and how algorithm details including the tooth notation system was described for easier understanding. Also because this algorithm won first-place in MICCAI Challenge 2023.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    In this rebuttal, benefit of utilization of the text prompts has been highlighted during training and inference stage, comparison with state-of-the-art results have been summarized and stated.




Author Feedback

We thank all reviewers for the constructive comments and recognizing the novelty and effectiveness of our method. In this rebuttal, we mainly address the raised concerns on the preference for VLM models over pure vision models such as YOLO, the stability analysis of our method, and the comparisons with dental specially designed models showing our method is preferable to address fine-grained dental abnormality detection (ours 66.3% vs best baseline 46.4% in AP50). We thank all reviewers for the valuable advice and will include more details in the revised version. For reproducibility, code and re-annotated dataset will be released upon acceptance.

➤ R1

  1. why VLMs over pure vision models? We thank the reviewer for insightful comments. We would like to clarify that, although text prompts serve as classification labels in both GLIP and G-DINO, in our method, we further use the dental notation system to empower language interpretation capability on accurate tooth locations. As shown in Table 4, our prompt design is effective for improved tooth enumeration. In addition, our method also adopts language-guided query selection based on G-DINO. Finally, the overall performance of our method (AP50 66.3%) drops when all VLMs are replaced with pure vision models, e.g., Faster RCNN (51.6%), YOLO (33.9%) and DINO (34.2%), also validating the necessity of VLM.

  2. Result fluctuations with limited data size (645 images) We report the mAP with a standard deviation of 35.8±1.1%, indicating the stability of our method. For the limited data, due to the data scarcity and expensive labeling in the field of dentistry, data size is typically small, e.g., DENTEX, the dental enumeration and diagnosis challenge, only provides 645 tooth labelled panoramic X-ray images, which we further enriched by annotating six fine-grained abnormalities.

  3. Fine-tuning of other approaches All baseline models are evaluated with fine tuning using the same data split as our proposed approach.

  4. Limited application beyond the dental area Although in this work we focus on dental abnormality detection, we would like to clarify that our work may be extended to other areas where notation or numbering systems can be leveraged, e.g., vertebrae and ribs.

➤ R3

  1. Task setting: prompt modification before detection? We appreciate the reviewer for the enlightening questions. We would like to clarify that all prompts used in our framework are fixed during the inference, not requiring determination or modification before detection.

  2. Influence of tooth recognition error Our framework can achieve precise tooth detection, with AP50 97.1% and mAP 67.4%. To further minimize the error accumulation in the next stage, we also crop each tooth with more pixels on the crown, root, left, and right sides to guarantee tooth recognition accuracy.

  3. Validate on multiple datasets We appreciate the reviewer’s valuable suggestions. Due to the current limitation of few open-access datasets that align with our task formulation, we will address this in future work by collecting an in-house dataset.

  4. Weaker performance on impacted tooth detection Our results on impacted tooth detection (mAP 71.2%) are only slightly lower than those of G-DINO (mAP 71.7%). Since the design of our framework focuses on detecting more fine-grained local-level abnormalities, the improvement in detecting global-level issues such as impacted teeth are not as significant as local abnormalities.

  5. Comparison with dental models In Table 1, we have compared our method (AP50 66.3%) with the dental model SegAndDet (37.7%). Comparing with other dental models PDCNN (4.0%) and HierarchicalDet (46.4%) also indicates the superiority of our proposed method.

➤ R4 We appreciate the reviewer for acknowledging the novelty of our methodology and the organized presentation. We would like to clarify that our method achieves better performance compared with the first-place winner of DENTEX (MICCAI 2023 challenge).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top