Abstract

In bronchoscopic navigation, depth estimation has emerged as a promising method with higher robustness for localizing camera and obtaining scene geometry. While many supervised approaches have shown success for natural images, the scarcity of depth annotations limits their deployment in bronchoscopic scenarios. To address the issue of lacking depth labels, a common approach for unsupervised domain adaptation (UDA) includes one-shot mapping through generative adversarial networks. However, conventional adversarial models that directly recover the image distribution can suffer from reduced sample fidelity and learning biases. In this study, we propose a novel adversarial diffusion model for domain-adaptive depth estimation on bronchoscopic images. Our two-stage approach sequentially trains a supervised network on labeled virtual images, and an unsupervised adversarial network that aligns domain-invariant representations for cross-domain adaptation. This model reformulates depth estimation at each stage as an iterative diffusion-denoising process within the latent space for mitigating mapping biases and enhancing model performance. The experiments on clinical sequences show the superiority of our method on depth estimation as well as geometry reconstruction for bronchoscopic navigation.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1749_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1749_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Yan_Adversarial_MICCAI2024,
        author = { Yang, Yiguang and Ning, Guochen and Zhong, Changhao and Liao, Hongen},
        title = { { Adversarial Diffusion Model for Domain-Adaptive Depth Estimation in Bronchoscopic Navigation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    (1) This paper proposed a novel and adversarial diffusion model for domain-adaptive depth estimation on bronchoscope images. Two-stage training pipeline is designed for domain adaption. (2) Some interesting results are shown. For example, in Fig. 4, the depth maps are projected to the 3D coordinates and aligned with the CT data for quantitative evaluation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) This paper proposed a novel and adversarial diffusion model for domain-adaptive depth estimation on bronchoscope images. Two-stage training pipeline is designed for domain adaption. (2) Some interesting results are shown. For example, in Fig. 4, the depth maps are projected to the 3D coordinates and aligned with the CT data for quantitative evaluation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) Considering the size of the lung airway, the accuracy of the proposed depth estimation method is not enough. The diameter of the bronchoscope is around 9~14mm and the size of the airway is about 10~25mm, but the RMSE and MAE errors of the proposed method are 6.0882mm and 4.1155mm. Therefore, I wonder whether this kind of accuracy is enough for bronschoscopic navigation. (2) Real-time depth estimation is vital in endoscopic navigation, but the authors did not report any results in the speed of the depth estimation. (3) The ablation study is done on the test instead of validation data.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It is suggested that the author should publish the code for reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Conclusion: This paper proposed a diffusion-based method for depth estimation during the navigation. Some interesting 3D geometry reconstructions are displayed. However, the accuracy of the depth estimation results is limited. Besides, some important results about the inference speed are not reported.

    Strengths: (1) This paper proposed a novel and adversarial diffusion model for domain-adaptive depth estimation on bronchoscope images. Two-stage training pipeline is designed for domain adaption. (2) Some interesting results are shown. For example, in Fig. 4, the depth maps are projected to the 3D coordinates and aligned with the CT data for quantitative evaluation.

    Weaknesses: (1) Considering the size of the lung airway, the accuracy of the proposed depth estimation method is not enough. The diameter of the bronchoscope is around 9~14mm and the size of the airway is about 10~25mm, but the RMSE and MAE errors of the proposed method are 6.0882mm and 4.1155mm. Therefore, I wonder whether this kind of accuracy is enough for bronschoscopic navigation. (2) Real-time depth estimation is vital in endoscopic navigation, but the authors did not report any results in the speed of the depth estimation. (3) The ablation study is done on the test instead of validation data.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The accuracy of the depth estimation results is limited. Besides, some important results about the inference speed are not reported.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors clarified that the proposed method aims at the depth estimation of the 5-th generation nodules scene. The authors also reported that the method in [1] achieved an average MAE/RMSE of 7.8mm/10.6mm for depth estimation. However, the method in [1] was evaluated on 454 frames, but the proposed method was only evaluated on 142 pairs. In this case, it is suggested that the author should claim that the proposed method focuses on the 5-th generation nodules scenes and create more validation data to evaluate.



Review #2

  • Please describe the contribution of the paper

    The authors proposed a new adversarial model integrated by a diffusion model to estimate the depth in bronchoscopic navigation. The approach is an unsupervised domain adaptation model that is trained with a supervised learning technique on virtual images and further adapted to the target data using domain-invariant representations. Their performance on one particular dataset and ablation studies support the effectiveness of their model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Creating virtual data for depth estimation
    2. Integrating diffusion into training GAN to gradually recover the depth estimations
    3. Ablation studies showing the impact of different modules in the proposed model
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. There are several inconsistency in the paper especially in the method part which makes reading the paper difficult. For ex. in 2.1 the authors referring the “following depth decoder” which is not mentioned. Or it seems that the definitions for the Eq. 1 are not properly and correctly defined. E_S referred before being defined.
    2. A simple baseline only trained on the virtual images and tested on the target could be included to compare the effect of domain gap
    3. The dataset seems to be an internal dataset. The creation of virtual images and whether the dataset will be public for future development is necessary to the field. Besides, there are not completely clear how the source and target datasets are being created, for instance, how they handpicked the virtual image-depth and why 142?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It is nice if the code will be available to public for further researches

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    – Presenting the results in compare with other SOTA (or baselines) usually comes before the ablations.

    – Similarly for the quantitative results

    – Metrics should be defined or at least cited for the readers

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The motivations behind the paper and the application to the medicine are very well defined and it is necessary to investigate the usefulness of AI. However, there are certain points that need to be addressed by the authors for example the creation of datasets, the evaluations, and the access to the code which can change the score.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Authors’ responses to my questions made it clear. However, the paper needs to be polished for the camera ready.



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors present an adversarial diffusion model for domain-adaptive depth estimation on bronchoscopic images. The network first trains in a supervised manner using simulated data.In the second stage an adversarial training occurs on real data conditioned by the first stage features to learn domain-invariant representations. In both training methods, the authors propose a condition-guided denoising module.

    Qualitative and quantitative evaluations are performed to compare results to other methods. Ablation studies are also performed. 3D reconstructions are also visualized.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very well written. the literature review and contribution explanation is very clear. The framework proposed is a novel technique that borrows ideas from related work and builds on them. The results show improvement through ablations and direct comparison to other methods. 3D reconstruction results are a nice addition.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Some clarifications need to be made mentioned in more details below. The methods the author compare to are not mentiond in the related work within the introduction. It is unclear why they chose them and why they didn’t compare with the state of the art discussed in the introduction. It is advised that the authors add these methods to the literature review section and explain why they compare to them.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It is advised that the authors make their code public upon publication for ease of reproducibility

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Comments:

    1. “Alternative” is used in the abstract. However, it is unclear what is being referred to; in other words, alternative to what?
    2. Abstract needs better flow when jumping from one idea to another. For example the authors start discussing UDA without a clear transition.
    3. EM-based methods are mentioned; however, it is unclear what they are and how is EM used and in which procedure.
    4. In explaining the contributions the author justify the use of a strategy similar to “latent diffusion” by saying it is better for high resolution. However, the data used by the author is low resolution. Please clarify.
    5. The authors compare with several methods. However, these methods are not present in the introduction when talking about the literature and thus it is unclear why these methods were chosen. It is also unclear why they didn’t chose methods they discussed in the literature review to compare to.
    6. Fig 3: the authors include [1] and [18] but don’t include discussion about them in the qualitative results section. Fig 3: Why isn’t the ground truth shown? also the read and yellow arrows are not mentioned in the discussion and it is unclear why they are used and what the difference in colour signifies.
    7. In the quantitative results discussion the authors use the word “decline” without saying decline as compared to what.
    8. The explanation of “This decline can be explained by the fact that style transfer models assume a similar spatial arrangement across domains, a condition impractical in this context due to the absence of standardized rendering configurations.” Is unclear. Please explain this more
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has great merit with respect to novelty, writing and experiments. However, since it is unclear how the method chose the other methods to compare to, I would recommend a rebuttal, after they explain or fix that, then the paper could be accepted.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I recommend acceptance. The authors clarified most of the comments. However, I recommend the authors include the methods they compare to (in the experiments) in the introduction and clearly state why they compare to these and not others.




Author Feedback

We thank all the reviewers for their valuable comments and advice. We appreciate the encouraging remarks such as “motivation and application are very well defined” (R1), “interesting reconstructions are displayed” (R3), and “great merit on novelty, writing and experiments” (R4). Our responses to the comments are as follows: 1.Clinical feasibility: (R3) Accuracy: We understand your concern regarding accuracy. It is important to clarify that the bronchoscope and airway sizes given are measured radially (diameter-wise), whereas the depth errors presented are measured axially (depth-wise). Previous studies by Banach et al.[1] (Med. Image Anal.) reported an average MAE/RMSE of 7.8mm/10.6mm for depth estimation and a tracking error of 6.2mm for target localization, achieving success rates of 95%/86% in main/lobar bronchi. Our method was integrated into navigation systems, aiding pulmonologists in reaching up to 5th-generation nodules. The presented errors were found to be accurate enough for navigation while ensuring localization points stayed within radial boundaries, confirming our method’s feasibility. (R3) Speed: Acceptable speeds range from 20-30fps in bronchoscopic systems (Chang,SPIE,2023;Zang,IMIP,2023). Our method meets real-time demands for navigation, with inference/systematic speeds (for reference only: 57.2fps/26.3fps) feasible through clinical evaluation. Further speed results will be provided in our works on systematic integration. 2.Description of datasets: (R1) Dataset creation: In the supplementary material, we visualized a three-step pipeline for generating image-depth pairs from virtual airways. We will enhance the details and release the code to support development in the field. (R1) Choice of testing number: The 142 pairs were carefully registered with real images for quantitative evaluation. This number balances label quality and quantity, covering various in-airway locations with different shapes and scales. 3.Experiments: (R4) Selection of comparison methods: We compared [6,15,33] (non-transfer models on natural scenes) to show the need for domain adaptation, and [1,18] (SOTA on endoscopic images) to show our superiority over clinical applications; these methods are more representative. Literature[3,20,27] was not chosen primarily due to their closed source. (R1) Baseline confirming domain gap: Previous studies have highlighted the cross-domain gap[1,30]. In Tab.4, we compared three non-transfer baselines[6,15,33] trained on virtual images and test on real images, proving the necessity of domain adaptation. (R3) Ablation study: The validation set includes virtual images, while the test set comprises only real images. The ablation study on test set is designed to ensure the method’s robustness in real scenarios. 4.Clarity: (R4) Discussions of ‘decline’: Compared to feature-level adaptive methods, the vanilla shows a 24.8mm decline in accuracy(Tab.1). Image-level adaptation is sensitive to domain differences, leading to incorrect translations[30]. In our case, perfectly real-like virtual images are challenging to generate due to limited rendering techniques in the field. Thus, our proposed model aims to learn domain-invariant features that are less affected by such visual disparities. (R4) Latent diffusion: Latent diffusion encodes images into an information-dense space[24], recovering final images with higher level of details. We leveraged its condensed nature for refined depth generation at our resolution. (R1, R4) Other details: Eq.1 represents a standard RMSE between GT&pred for pixel-wise loss. We cited the metrics[6,15] including accuracies and errors for readers to follow. We clarified the discussion of comparison against[1,18] and GT; the arrows highlighted these differences as described in Fig.3 caption. All the unclarities in Abs.&Intro. were carefully repolished to improve paper’s flow and clearness. Once again, we express our gratitude to the reviewers and Area Chairs for their time and effort.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top