Abstract

Surgical guide plate is an important tool for the dental implant surgery. However, the design process heavily relies on the dentist to manually simulate the implant angle and depth. When deep neural network has been applied to assist the dentist quickly locates the implant position, most of them are not able to determine the implant depth. Inspired by the video grounding task which localizes the starting and ending time of the target video segment, in this paper, we simplify the implant depth prediction as video grounding and develop a Texture Perceiver Implant Depth Prediction Network (TPNet), which enables us to directly output the imaplant depth without complex measurements of oral bone. TPNet consists of an implant region detector (IRD) and an implant depth prediction network (IDPNet). IRD is an object detector designed to crop the candidate implant volume from the CBCT, which greatly saves the computation resource. IDPNet takes the cropped CBCT data to predict the implant depth. A Texture Perceive Loss (TPL) is devised to enable the encoder of IDPNet to perceive the texture variation among slices. Extensive experiments on a large dental implant dataset demonstrated that the proposed TPNet achieves superior performance than the existing methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2491_paper.pdf

SharedIt Link: https://rdcu.be/dV180

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72086-4_57

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Yan_Simplify_MICCAI2024,
        author = { Yang, Xinquan and Li, Xuguang and Luo, Xiaoling and Zeng, Leilei and Zhang, Yudi and Shen, Linlin and Deng, Yongqiang},
        title = { { Simplify Implant Depth Prediction as Video Grounding: A Texture Perceive Implant Depth Prediction Network } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {606 -- 615}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper formulates implant depth prediction as a video grounding task, taking the slices of CBCT images as a video to localize the starting and ending time of the target video segment. A novel TPNet is developed to predict implant depth without measuring oral bone, including an implant region detector and an implant depth prediction network.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Formulating implant depth prediction as a video grounding task is inspiring, which also simplifies the measurement.
    • Propose a two-step structure - locate the implant position and predict the implant depth, which provides an easy-to-use solution for measurement.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The major concern is the accuracy, given that the proposed method is not measuring with geometry awareness. However, there’s no evaluation metric in terms of distance error (mm), and the spacing of CBCT images is not reported (before and after resampling). Acc may be a good metric for video grounding, but is not as meaningful for depth prediction. It’s unclear that how Acc(R@1, m=0.8) reflects the measurement accuracy. Golden standard is needed for evaluation.
    • The method takes the sagittal slices as videos to predict depth, which implies a strong assumption that depth is always vertical. However, depth should be measured along the implant/tooth axis, which cannot be perfectly vertical.
    • In Tab. 3, even with m=0.6, the highest Acc(R@1) of all video grounding methods is only 34.5, which may indicate that video grounding may not be a good solution for implant depth estimation. The proposed method only surpasses other methods with m=0.8, and the performance gap with m=0.6 and 0.7 is not discussed. In Tab. 2, the performance is improved by TPL when m=0.8, while drops dramatically when m=0.6 and 0.7.
    • Comparison only includes video grounding methods, lacks comparisons with other implant depth estimation methods. The proposed method should simplify this task, but no comparison is provided in terms of memory and time efficiency comparing other implant depth estimation methods.
    • There’s only one case for visual comparison of ablation study, no other qualitative results for comparisons.
    • In Fig. 2, the definition of texture variation is not clear. How figures in (b) are obtained is not clear. Fig. 2 should be put in Page 5 after Fig. 3.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See above

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes an interesting solution for implant depth prediction, by formulating it as a video grounding task. However, the weaknesses in accuracy and experimental setup seem concerning.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thanks to the rebuttal. Thanks for addressing the concerns regarding the use of sagittal axis and accuracy criterion. I encourage the authors to explain the use of sagittal axis in the manuscript, and the goal is to “ensure the center of the implant root maintains a minimum safety distance of 1.5mm with the mandibular nerve canal”. Regarding the performance shown in the table, it’s unclear why the accuracy at m=0.6 and 0.7 are included if they are meaningless for real clinical requirements. I recommend only reporting accuracy at larger m values (and explaining in detail what these results signify in terms of distance), adding metrics of signed distance error (overestimation of depth could be more critical in this application) and the percentage of errors exceeding 1.5mm or 1 mm. While this method may not provide the most precise measurements, its significant advantages lie in the inspiring and compact problem formulation, so I also encourage the authors to highlight these strengths by comparing the efficiency against previous methods.

    Minor: a typo in Fig 3, “TDL” should be “TPL” in the subtitle of (d).



Review #2

  • Please describe the contribution of the paper

    This paper proposed a two-stage pipeline for dental implant depth estimation from CBCT data. The pipeline consists of an implant region detector (IRD), which can get the cropped region of interest from CBCT to reduce subsequent computation cost, and a implant depth prediction network (IDPNet) which do regression on the cropped volume to predict the start and end slice for implant. The experiments shows reasonable performance of the proposed pipeline.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well organized. Fig. 3 clearly presents the proposed architecture of the pipeline; Ablation study is well done and results presented in Table 1 and Table 2 show the tradeoff of the designs.
    • The formulation of implant depth prediction problem into video grounding task is novel; Table 3 also compares video grounding models with the proposed pipeline and showed its advantage on accuracy of high IoU threshold.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. IRD is just a simplified version of an existing work, a text guided implant position prediction network - TCEIP [12], which reduces the main innovative contribution of this paper to adding a IDPNet to explicitly regress the implant position by start slice and end slice. 2.The design logistics of the IDPNet architecture is bit confusing, for regression network, it cares about global information and so intuitively an encoder network + regression head should suffice the need. Adding decoder increases the computation complexity.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The innovation and contribution of this paper could be better consolidated if there is more detailed discussion on why the proposed pipeline (TPNet) outperforms the existing text guided implant position prediction network - TCEIP, especially from the clinical perspective. Does TPNet provide more useful information to dentists than TCEIP? Or is it easier to use, more computation-efficient etc?

    2. The design of the network would be more convincing if the paper can provide detailed design considerations on why it uses a encoder+decoder architecture and even better if there’s corresponding ablation study to show the neccesity and advantage of having the decoder part.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has pretty clear presentation of network architecture, ablation experiments and reasonable outperforming performance over other models. Yet the two key weakness mentioned above need to be addressed.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a method to estimate depth of dental implants from 3D CBCT images based on “video grounding” techniques (capture start and end image slices with the implant).

    • first, a region of interest from 2D slices of tooth crown images is detected integrating
    • the cropping is guided with text embedding using CLIP (left, middle, right..)
    • the 3D volume is then cropped using this ROI for increased computational efficiency
    • a set of texture features are computed and the depth-predicting network is trained by minimizing loss measuring the predicted slice overlap with the ground-truth
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • introducing a technique not commonly used for 3D medical images
    • validation on a large dataset
    • the presented ablation study helps to better identify the more important components of the proposed solution
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • limited comparison with alternative methods (e.g. a 3D resnet predicting the depth directly on the extracted ROI / unet segmenting the slices and measuring the bounding box,…)
    • the benefits of the Canny edge detector are not clear, wouldn’t it be sufficient to let the network learn the appropriate filters instead?
    • a minor weakness is the validation on data from the same CBCT scanner
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    the benefit of CLIP and the text embedding is unclear, if it is needed, cannot the limited vocabulary “left”/”middle”/”right” be encoded as a 1D vector, for example [1, 0, 0] for “left”?

    “temporal iou” - define IoU when first used, and maybe “temporal” does not make sense as there is no temporal element and it does not translate well from the domain of video grounding to the 3D images. Consider calling it slice IoU or something like similar.

    Consider comparison of other non-video grounding techniques.

    typo: “we chooses”

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes an interesting technique to detect slices where a dental implant can be found and then infer the depth of the implant.

    However, the validation seems to be limited only to video-grounding techniques.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We sincerely thank all of the reviewers for their comments and their acknowledgment of the novelty, methodology of our work. We address the key concerns below and will further improve our paper. R1-TPNet vs.TCEIP. Different from TCEIP, our TPNet is designed to predict the implant depth. IRD is the first stage of TPNet and is designed to locate the implant region to crop a sub-volume from the CBCT data, which greatly reduces the computation costs. Moreover, IRD is an improved version of TCEIP, which is more suitable for clinical applications in terms of balanced accuracy and speed. R1,R3-Encoder+Decoder. Intuitively, implant depth regression only requires global information. However, when we only used encoders+regression head, the network encountered poor performance. Through discussions with dentists and visualization of network features, we found that the texture of neighboring teeth in the implant area has strong reference for determining the implant depth, which means that the local features are also key information. Therefore, we design a decoder to encourage the network to learn fine-grained local features, which greatly improves the accuracy. We will clarify it in the final version. R3-Canny Operator. Let the network learn a filter is a good idea, however, randomly initialized filter is prone to gradient explosion during initial network training due to their inability to extract good edge features. In contrast, canny operator is simple and effective. We compare multiple operators, canny achieves the best accuracy. R3-Vocabulary vs.Vector. Using vocabulary to extract text embedding from CLIP has been proven to be more effective than using 1D vector in TCEIP, our results are consistent with this conclusion. R4-Acc Criterion. The spacing of CBCT is 0.2mm. In our task, the kernel of the implant should not invade the mandibular nerve canal and should maintain a minimum safety distance of 1.5mm. Therefore, as long as the center point of the implant root conforms to this rule, it is a good prediction. In this paper, we consider the timeline of the video as the sagittal axis of CBCT, so we can directly use IOU to measure the accuracy of implant depth prediction while ensuring that the implant roots meet the standards(>1.5mm). We will clarify this definition in the final version. R4-Using Sagittal Axis. In the task of predicting implant depth using sagittal axis, the posture of the implant does not need to be considered. As discussed in the above evaluation criterion, we aim to ensure that the center of the implant root maintain a minimum safety distance of 1.5mm with the mandibular nerve canal. R4-Performance in Table 2,3. Our annotation sets the maximum value of implant depth in the above rules to ensure the accuracy of network prediction. Therefore, the larger the m, the greater the accuracy of the prediction. When TPL is introduced, it bring large improvement at m=0.8, which is very meaningful for clinical applications, as the accuracy at m=0.6 and 0.7 is far from real clinical requirements. Moreover, we found that powerful backbone will bring large improvement. R3,R4-More Comparison. The commonly used method to predict implant depth is based on bone measurements. We compare TPNet with measure-based method, e.g.,dental-yolo, in which TPNet has higher accuracy and nearly three times faster inference speed, demonstrating the potential of TPNet in clinical applications. R4-Visual Comparison. Limited by the paper length, we only selected the most representative visualization results, to validate the effectiveness of TPL loss, from which we can see that the TPL enables the network to make more accurate predictions, both at the start and end slice.
R4-Texture Variation. The texture variation refers to the pixel variation of all 2D slices in Fig.2(a), which is obtained by calculating the standard deviation of all 2D slices. The brighter the Fig.2(b), the greater the texture variation. We will clarify it in the final version.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All reviewers agree that the paper is well-written and the idea of proposing depth prediction as a video grounding task is an intriguing solution. However, they have expressed concerns about weaknesses in accuracy and the experimental setup. Since the authors have addressed most of these concerns in the rebuttal, I am inclined to recommend accepting the paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All reviewers agree that the paper is well-written and the idea of proposing depth prediction as a video grounding task is an intriguing solution. However, they have expressed concerns about weaknesses in accuracy and the experimental setup. Since the authors have addressed most of these concerns in the rebuttal, I am inclined to recommend accepting the paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Given the current state of the paper and the mixed reviews, a weak accept is recommended, contingent on the authors providing additional comparisons and clarifications as suggested by the reviewers.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Given the current state of the paper and the mixed reviews, a weak accept is recommended, contingent on the authors providing additional comparisons and clarifications as suggested by the reviewers.



back to top