Abstract

Accurate bronchoscope localization is essential for pulmonary interventions, by providing six degrees of freedom (DOF) in airway navigation. However, the robustness of current vision-based methods is often compromised in clinical practice, and they struggle to perform in real-time and to generalize across cases unseen during training. To overcome these challenges, we propose a novel Probabilistic Airway Navigation System (PANS), leveraging Monte-Carlo method with pose hypotheses and likelihoods to achieve robust and real-time bronchoscope localization. Specifically, our PANS incorporates diverse visual representations (e.g., odometry and landmarks) by leveraging two key modules, including the Depth-based Motion Inference (DMI) and the Bronchial Semantic Analysis (BSA). To generate the pose hypotheses of bronchoscope for PANS, we devise the DMI to accurately propagate the estimation of pose hypotheses over time. Moreover, to estimate the accurate pose likelihood, we devise the BSA module by effectively distinguishing between similar bronchial regions in endoscopic images, along with a novel metric to assess the congruence between estimated depth maps and the segmented airway structure. Under this probabilistic formulation, our PANS is capable of achieving the 6-DOF bronchoscope localization with superior accuracy and robustness. Extensive experiments on the collected pulmonary intervention dataset comprising 10 clinical cases confirm the advantage of our PANS over state-of-the-arts, in terms of both robustness and generalization in localizing deeper airway branches and the efficiency of real-time inference. The proposed PANS reveals its potential to be a reliable tool in the operating room, promising to enhance the quality and safety of pulmonary interventions.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0928_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0928_supp.zip

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Tia_PANS_MICCAI2024,
        author = { Tian, Qingyao and Chen, Zhen and Liao, Huai and Huang, Xinyan and Yang, Bingyu and Li, Lujie and Liu, Hongbin},
        title = { { PANS: Probabilistic Airway Navigation System for Real-time Robust Bronchoscope Localization } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposed a probabilistic approach to estimate camera pose during bronchoscopic navigation by using a depth-based pose estimation and point cloud matching. The method is validated on clinical cases and quantitative and qualitative results have shown improved pose accuracy and efficiency compared to baseline methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The amount of clinical cases used for validation in the paper is adequate. Manually labelled 6DoF pose GT on 31 bronchoscopic intervention cases is a great amount of work. The results are well presented in figures and table. The proposed method combined multiple modules for pose estimation, including Monte-Carlo method, cycle-gan based depth estimation, pose estimation using FlownetC and MLP, YOLOv7 for lumen detection and lumen tracking based on ResNet-50.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    No discussion of limitations of the proposed method is provided in the paper. The clinical feasibility of the proposed method can be restricted by image artifacts in real clinical scenarios. Centerline driving is a strong assumption which may not be valid in all clinical cases. It is not clearly stated in the paper that how the landmark point cloud matching is better than direct similarity comparison between depth maps or similarity comparison between point cloud converted from video depth map and the point cloud generated from virtual projection of airway model.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Some implementation details are missing in the paper. For example, the parameters for initializing particle set S0 around initial pose. How large is the search space? It is a great effort to make the large clinical dataset with GT 6DoF poses for training the ML modules in the framework. However, without the access to the dataset, the results based on supervised training can not easily be reproduced.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. How does Landmark detection make use of both X_t-1 and Xt? How does landmark tracking deal with new and disappearing branches?
    2. How does landmark point cloud matching deal with partially visible branches in the video? How much improvement can be achieved by excluding the point cloud of non-airway regions in the point cloud similarity comparison?
    3. Please provide more details about airway association module.
    4. In Algorithm 1, how many times of resampling of particles are used for each time stamp/frame?
    5. The introduced centerline constraint has two hyper parameters. Are those parameters set based on any literature or data?
    6. Does the weight sum of particles give a better pose estimation compared to the pose with the highest likelihood?
    7. There is no cross-validation of the proposed method.
    8. Please provide some discussion around the limitations of the proposed method. The testing dataset includes images with blur or bubbles, etc. Is the proposed method sensitive to the image artifacts?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although good testing results are presented, no discussion around the limitation of the proposed method is included in the paper. More explanation of the landmark point cloud similarity measure need to be included.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper “PANS: Probabilistic Airway Navigation System for Real-time Robust Bronchoscope Localization” presents a method of navigating a bronchoscope through the airways of the lung using only the built-in camera and a segmented CT of the lung as a roadmap.

    The method is built on a particle filter approach, where machine learning-based depth estimation is used to model the state change at each time step (generate state hypotheses), and vision is used to weigh the likelihood of each state hypothesis with reference to visual landmarks and other heuristics.

    Experiments are conducted on a dataset consisting of 31 recorded bronchoscopies: 20 for training, 1 for validation, and 10 for testing. Results show promising results over existing vision-based methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Strengths of this paper include:

    1. A novel particle filter-based approach to navigating a bronchoscope through the airways of the lungs using only vision and a segmented CT. These are readily available in most standard bronchoscopies so this approach obviates the need for additional sensors, which would add cost and complexity to a system.

    2. Clear explanation of the approach, with some components leveraging previously developed techniques such as depth estimation from vision. Clear clinical rationale motivating and guiding the work.

    3. Ability to develop and test on clinical data, which helps reveal many challenges that would be difficult to envision in non-clinical settings and smooths the path toward clinical translation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Weaknesses of this paper include:

    1. Incremental innovation - While the combined used of particle filters and depth estimation for bronchoscopy navigation is novel, each component of the method is a preexisting one as is the topic of vision based bronchoscopy navigation.

    2. Few details are provided about the dataset which prevents the reader from understanding the distribution of the 31 datasets, e.g., whether they correspond to unique patients. This makes it hard to draw conclusions about generalization.

    3. Because the method is run offline on previously saved brochoscopy navigations, it is hard to judge whether the method would have performed well enough in real time to guide the clinician down the correct branches on the airways. Apparently the navigation to collect the data was performed without the aid of navigation.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Overall the paper presents a novel vision-based approach to helping a clinician navigate a bronchoscope through the airways of a patient. The most critical tasks may include selecting the correct airway to traverse at each branch, and localizing lesions or targets of interest. Such tasks can help clinicians achieve these goals more efficiently and effectively. Additionally, development and testing is performed on in vivo clinical data which is and additional step towards clinical use.

    On the other hand the innovation presented is mild, as many similar methods already exist to achieve the same goals. Assuming that the ultimate goal is clinical impact, it is not a lack of methods that is standing in the way. Thus without real world use the community will not know whether or not the speed and accuracy improvements achieved in this work are necessary to achieve good clinical results.

    The paper is generally well written and clear overall, but one unclear aspect appears at beginning. The paper highlights the need for 6-dof localization, but the 6-dof state variables are not defined in the methods section nor evaluated in the results section. The acronym “ATE” is used but not defined, but nevertheless suggests that error is measured as a 3D distance quantity.

    Reiterating a point made previously, the fact that 31 datasets are used is a positive; however the paper does not provide further insights as to the nature of these datasets: the number of unique patients, the characteristics of the patients (demographics, disease conditions, etc.), and so on. This prevents the reader from understanding the extent to which the method can generalize beyond the training data.

    Another positive aspect of the paper is mentioning some of the challenging scenarios that can appear in in vivo studies. Some examples listed in the paper include poor visibility, motion blur, and bubbles. The paper can go further to list possible limitations such as the approach, such as how the Depth-based Motion Inference is affected by extended symmetrical, texture-less lumens and/or a bronchoscope that stays stationary for a long time. This would allow the paper to suggest and possibly implement mitigations to these limitations. One mitigation offered is the use of landmarks to identify anatomical branches; the paper can further elaborate how branches can be visually distinguished from one another, and that would help the reader understand the strengths and limitations of the work better.

    The paper uses expert labelers to generate ground truth poses by registering virtual and real bronchoscopic views. While this is presumably fairly reliable, the reader can appreciate the ground truth better if the nature of the experts were described, as well as how faithful the virtual views match the real views.

    Page 2, “pose hypotheses overtime” should be “pose hypotheses over time”.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well presented overall and supported by an in vivo dataset, while the innovation is fairly light as are the details of the dataset.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    My rating falls exactly between those of my co-reviewers’:

    • R4’s higher rating is primarily based on the strength that the method is interesting. I agree with this, but not enough to warrant an increase to my previous rating towards R4’s.
    • R7’s lower rating is primarily based on the weaknesses that the limitations of the method and point cloud similarity are not well described. I agree with this as well, but not enough to warrant a decrease to my previous rating towards R7’s.

    Authors emphasize in the rebuttal that the primary contribution is not the methodology as such but its speed. The bronchoscope acquisition frame rate is ~15 fps, and authors demonstrate an ability to process frames to meet this requirement.

    There remains additional steps to demonstrating true real time performance. Frame rate is in part a function of the number of particles used in the particle filter, and different scenarios may require different numbers of particles to maintain an expected level of performance. Furthermore the number of required particles can vary within a procedure, so the algorithm should be able to specify a certain level of performance within a specified time frame to be considered truly real time.

    Another consideration is duty cycle, wherein the computation time per frame should leave enough “free” time between frames to account for frame acquisition variability and computational variability between frames.

    Finally, it is another step to demonstrate real-time performance open loop, i.e., as the procedure is performed so as to assist in navigation, vs. evaluation on offline video in which navigation has already been performed through other means.

    Given these considerations I maintain my prior evaluation and rating.



Review #3

  • Please describe the contribution of the paper

    This paper presents a novel approach for the 6 DOF estimation of a bronchoscope pose within lung airways. Compared to previous works that use Registration to a 3D surface models of the airway and Depth estimation networks, this work poses the problem with a Bayesian formulation where Depth estimation provides possible poses, and semantic segmentation of bronchial airways “assesses” the likelihood of such poses. The key novelty is the use of airway semantics to condition poses obtained by depth estimation in a particle filter formulation. Validation is performed on data from 31 clinical cases, where 10 are left as unseen data. Tracking error is assessed and compared with three other relevant methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well motivated and addresses a challenging task of interest to the community.
    • The paper assembles an interesting Bayesian formulation with previously reported Depth estimation (DMI) and semantic localisation methods (BSA).
    • Comparisons with relevant methods are provided, using common metrics.
    • Supplementary material provides thorough information on previous methods, including tested data.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The two main previously used modules include multiple components that could be better described.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I believe the paper proposes a very interesting formulation for the bronchoscope localisation problem. However, some aspects could be clearer, mostly in the modules that were drawn from the literature. Details below:

    • The paper is very method heavy, and the description of the BSA and DMI could be clearer. The DMI for example is based on the DD-VNB (if I understood correctly) – authors should state upfront the whole module is based on previous work (since the losses used were the same as in the DD-VNB paper).
    • The BSA module comes from another paper (BronchoTrack) that also does bronchoscope localisation – could this method have been used for comparison as well? Also, authors should be clear on whether all the steps from this framework were used or not.
    • In equation (5), should wt,l be wt,k, as k is landmark and not l? This is a minor notation detail.
    • How it h(otk) estimated for a specific label? By extracting the depth estimation map on the segmentation region obtained by BSA? This could be clearer.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I believe the paper proposes an interesting (although incremental) contribution to bronchoscope localization that shows improvements towards existing state-of-the-art methods.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have properly addressed all of my minor comments i their rebuttal (regarding method description and choice of comparisons), so I have mantained my acceptance decision.




Author Feedback

We thank the reviewers for their detailed feedback and positive comments regarding motivation (R4&R6), formulation (R4) and results (R4, R6&R7). We address the major concerns point-by-point.

  • Novelty of the Paper (R6): PANS stands out in visual bronchoscopy navigation being the first addressing speed, generalization, and robustness, crucial for clinical use. Inspired by surgeons’ strategies, it integrates motion monitoring and landmark identification into a cohesive and high-performance Bayesian framework, which is our primary contribution. PANS offers modularity with off-the-shelf networks, ensuring adaptability and reproducibility. Validated on extensive clinical data, PANS demonstrates substantial improvements over existing methods, underscoring its potential for clinical applications.
  • Method Details: PANS integrates several components. DMI uses depth and motion networks from DD-VNB, while introducing a superior framework. BSA adapts BronchoTrack as landmark detection, excluding loop closure and localization steps. PANS achieves finer granularity (coordinate-level vs. branch-level), clarifying why BronchoTrack is not directly compared (R4). In BSA, landmark tracking uses the entire video sequence, with motion patterns modeled and a Re-ID network tackling temporary target disappearance. Airway association assigns branch labels by creating lumen hierarchy, using CT segmentation for lumen recognition and tracking for label propagation (R7). Lumen depth is derived from estimated depth within detected bounding boxes (R4). Partial point clouds are not treated specially. The improvement is shown in ablation study “PANS w/o BSA,” improving ATE by 6.4mm compared to registrating entire frame depth. PANS initializes particle set around the ground truth with defined variance, resampling particles once at each time step. The highest weight particle is used for localization, as lower-weight particles dilute the estimate. Hyperparameters are empirically tuned based on validation set (R7). Module descriptions will be clarified in §2.
  • Dataset Details: Each case is from a different patient, ensuring evaluation on untrained patients. The dataset includes various conditions such as pulmonary nodules, pneumonia, and lung cancer. Expert surgeons provided labels, validated through inter-observer variability, showing an average discrepancy of 0.58 mm. (R6) With 31 cases, a single train-validation split should suffice, and considering the computation cost, we did not apply cross-validation (R7). Efforts are underway to enable public access pending ethical approval (R7). Those will be included in §S.2.
  • Real-Time Feasibility (R6): PANS’s inference speeds reach the endoscopic frame rate, feasible for clinical operation. While tested offline, we are actively optimizing the method for real-time validation in ongoing animal studies.
  • 6-DoF State (R6): The 6-DoF refers to the position and orientation. Evaluation used metrics like Average Trajectory Error (ATE), SR5, and SR10, following existing works (ref. [19,8,22] in manuscript). While ATE primarily indicates positional errors, it indirectly incorporates rotational errors due to their impact on trajectory. Average errors for look direction are (°): EDM: 54.85, Depth-Reg: 69.08, DD-VNB: 53.64, PANS: 29.93. Average errors for roll angle are (°): EDM: 71.68, Depth-Reg: 65.15, DD-VNB: 57.34, PANS: 42.54. Handheld bronchoscope data with wide roll range (approx. -180° to 180°) tend to exhibit larger roll errors across methods.
  • Limitation Discussion: PANS exhibits superior resilience to minor artifacts, as depicted in Fig. S3 (R7). The centerline driving hypothesis is informed by clinical practice and literature (ref. [13] in manuscript) (R7). Additional limitations will be elaborated in future journal versions due to space constraints (R6).
  • Other issues: We thank R4 and R6 for pointing out the notation and expression mistakes, which will be corrected in the final manuscript.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal has addressed the major questions from the reviewers. I recommend acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The rebuttal has addressed the major questions from the reviewers. I recommend acceptance.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top