Abstract

Monocular Metric Depth Estimation (MDE) in endoscopic images is a crucial step to improve navigation during medical procedures, as it enables the estimation of dense, real-scale 3D maps of the organs. For instance, in monocular flexible ureteroscopy (fURS), accurate navigation and real-scale information are essential for locating and removing kidney stones efficiently. Currently, the most promising approach to infer depth from single passive cameras is by supervised training of large neural networks, so-called foundation models for MDE. However, the depth output of these models is biased when the training data domain does not fit the goal domain (both camera and scene). At the same time, one of the greatest challenges in medical imaging is the lack of annotated datasets, as obtaining real ground-truth (e.g., depth data) is difficult. To overcome this, simulation has become a valuable tool in ureteroscopic imaging research. In this study, we introduce KidneyDepth, a synthetic dataset designed to reduce the gap between simulated and real-world 3D imaging. It includes a variety of shapes (e.g. mesh from CT scan, geometric primitive forms) along with different textures and lighting conditions, generated by BlenderProc2~\cite{Denninger2023}. To assess the effectiveness of KidneyDepth, we fine-tune two state-of-the-art MDE models (Depth Anything V2 and ZoeDepth) and test their performance on both simulated and real ureteroscopic images. Additionally, we evaluate the validity of their output by using the inferred depths in the context of a RGB-D SLAM system. Our results show that training models on a synthetic dataset with diverse structures and lighting conditions improves depth estimation in real endoscopic images and our simulations show that these RGB-D images enhance overall SLAM accuracy. The KidneyDepth dataset can be found in https://zenodo.org/records/14893421.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2047_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

KidneyDepth: https://zenodo.org/records/14893421

BibTex

@InProceedings{OliLau_KidneyDepth_MICCAI2025,
        author = { Oliva-Maza, Laura and Steidle, Florian and Klodmann, Julian and Strobl, Klaus and Miernik, Arkadiusz and Triebel, Rudolph},
        title = { { KidneyDepth: A Synthetic Kidney Dataset for Metric Depth Estimation in Ureteroscopy } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15968},
        month = {September},
        page = {331 -- 341}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This study presented a synthetic dataset for metric depth estimation in ureteroscopy. It evaluated the proposed dataset by fine-tuning two models, i.e., Depth Anything V2 and ZoeDepth. Evaluation results verified the effectiveness of the proposed dataset.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A synthetic dataset for metric depth estimation in ureteroscopy was created and evaluated.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. This paper mainly introduce a dataset, however, the details of the dataset, such as the detailed generation process, number of subjects and images, which underestimates the value of the dataset.
    2. The contributions of the study are limited. The method used for data generation is based on a open-source method. What’s more, the details about the generation processes are missing. It ‘s hard to understand the know-how used for the dataset development.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. In Figure 1(a), which shows CT scan of the kidney and the trajectories of the camera, How did you turn the CT scan (slice images) to a model with rendering information. This is key information of the study.
    2. What were the success rates of Depth Anything V2 and Zoedepth?
    3. While Table 1 shows that Depth Anything V2 outperforms Zoedepth, it seems that Depth Anything V2 performs worse than Zoedepth. Can you explain more about the results of DA and Z?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Authors failed to introduce the details of the dataset development
    2. Some of evaluation results are confusing
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper introduces KidneyDepth, a synthetic dataset specifically designed for monocular metric depth estimation in flexible ureteroscopy (fURS). Using BlenderProc2, the dataset simulates a wide range of kidney shapes, materials, and lighting conditions to closely mimic real endoscopic scenarios. The authors fine-tune two leading depth estimation models, Depth Anything V2 and ZoeDepth, with this synthetic data and evaluate their performance on both simulated and real ureteroscopic images. Additionally, they integrate the depth predictions into an RGB-D SLAM system to enhance navigation and mapping capabilities during surgery. The results demonstrate that models trained on diverse synthetic data can generalize better and yield more accurate depth estimation in real-world medical images.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • KidneyDepth is the first synthetic dataset tailored for ureteroscopic images with metric depth maps, addressing a significant need in the medical imaging field.
    • The dataset’s diversity in shapes, materials, and lighting conditions enables robust model training and improves the models’ ability to generalize.
    • The study provides a thorough evaluation using both simulated and real data, as well as two state-of-the-art depth estimation models.
    • Incorporating the depth predictions into an RGB-D SLAM system shows clear benefits for surgical navigation, including improved real-scale pose tracking and denser mapping.
    • The dataset is publicly available, supporting further research and development in this domain.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • There remains a noticeable gap between model performance on synthetic versus real images, indicating that the simulation-to-reality challenge is not fully resolved.
    • The evaluation on real images is limited to a relatively small dataset from a single institution, which may not represent the full range of clinical variability.
    • Depth Anything V2, although often more accurate, is also prone to more frequent failures on real data compared to ZoeDepth.
    • The approach requires significant computational resources, which may limit accessibility and reproducibility for some researchers.
    • The focus of the work is exclusively on depth estimation and does not address other important aspects of endoscopic navigation, such as segmentation or detection of anatomical structures.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper makes a valuable contribution by introducing a synthetic dataset for depth estimation in ureteroscopic images and demonstrating its practical benefits, but limitations remain—particularly the sim-to-real performance gap, limited real-world validation, and a narrow focus on depth estimation. Despite these issues, the work is solid and the dataset will benefit the community, justifying a weak accept.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The proposed KidneyDepth dataset addresses a meaningful and underexplored problem in the context of flexible ureteroscopy: the lack of accessible and annotated data for monocular metric depth estimation. One of its key strengths lies in its well-designed structure and domain-aware synthetic design using BlenderProc2, incorporating diverse anatomical shapes, materials, lighting conditions, and camera poses to mimic real intra-renal endoscopy scenarios. The dataset is structured to test model generalization across unseen shapes and textures, which is valuable for assessing robustness in real clinical applications. Moreover, the authors fine-tune two foundation depth models (ZoeDepth and DAV2) and demonstrate improved performance on real-world endoscopic sequences, both in pixel-level depth accuracy and downstream RGB-D SLAM tracking, which provides practical relevance. The public release of the dataset (via Zenodo) also encourages reproducibility and may serve as a strong benchmark for future research in surgical depth perception and robotic navigation.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work introduced KidneyDepth, a dataset designed for metric depth estimation consisting of various shapes (ranging from real CT scans of the kidney to primitive forms like cylinders and tori), lighting conditions (static lights and lights mimicking the endoscope’s illumination, i.e., attached to the moving camera), and diverse materials (with different reflection properties).

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    While the proposed KidneyDepth dataset introduces a valuable synthetic resource for monocular depth estimation in fURS, the current experimental design reveals limitations in both methodological rigor and clinical relevance. The use of abstract geometries (e.g., cylinder and torus) lacks anatomical plausibility, which may be reflected in the degraded performance observed in Table 2. Furthermore, the paper does not report evaluation results on other datasets using DAV2 or ZoeDepth, nor does it include any qualitative assessment from clinicians, leaving the practical utility of the predicted depth maps uncertain.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper introduces KidneyDepth, a synthetic dataset specifically designed for monocular metric depth estimation in flexible ureteroscopy (fURS), addressing a meaningful and underexplored challenge in surgical navigation. The dataset is well-constructed, with careful control over geometry, material, illumination, and camera motion, and it is structured to evaluate model generalization across diverse conditions. The authors demonstrate the utility of KidneyDepth by fine-tuning two recent foundation models (ZoeDepth and DAV2), showing improved performance in both depth estimation and downstream SLAM tracking on real-world data. However, the synthetic geometries used (e.g., torus, cylinder) may not reflect real anatomical structures and could negatively influence real-world generalization, as hinted by performance drops in Table 2. Fig. 2 provides the visualization of depth prediction results, but no quantitative comparison between with and without fine-tuning. Moreover, there is no evaluation on external datasets or qualitative evaluation from clinicians, which weakens the claim of clinical applicability.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

Thank you for your valuable time and effort in reviewing our work. Please find below our responses to your comments and suggestions.

Reviewer #1: Thank you for your feedback. We will include more detailed information about the dataset on the website https://zenodo.org/records/14893421 once the full version is uploaded. Additionally, we have now specified in the paper that the mesh was generated from the CT scan using ImFusion Suite.

As you pointed out, Table 1 shows that Depth Anything outperforms Zoedepth. However, as noted at the end of page 5, while Depth Anything performs better in simulation, it is more prone to failure on real images (see Figure 2). This may be due to lower generalization capability compared to Zoedepth, though further investigation is needed.

Regarding the success rate, in the paper we have focused on the accuracy in simulation, but we agree that measuring success rate would be valuable, especially for real ureteroscopic images. Unfortunately, due to the lack of ground truth in real data, we were unable to compute this metric.

Reviewer #2: Thank you for your positive feedback. We are aware of the sim-to-real gap and we will continue working on reducing it. However, due to the lack of real ureteroscopic dataset with depth ground truth and the challenges of acquiring such data in real medical environments, we decided to create the first simulated ureteroscopic environment, that we will improve over time. This paper and dataset focus mainly in depth estimation, being this a milestone of a bigger work.

Although the evaluation on real images is limited to a single institution, it includes images from various kidney stone removal procedures, providing some degree of variability.

Reviewer #3: Thank you for your thoughtful comments.The inclusion of synthetic geometric shapes was intended to assess whether mesh structure influences the training of Depth Anything and Zoedepth. We trained the models using (i) only cylinders and torus, (ii) only the kidney mesh, and (iii) all shapes combined. As shown in Table 1, the differences in accuracy when training them using only kidney mesh or using all shapes are small, meaning that shape is important but also that adding primitive geometries does not significantly degrade performance.

Regarding the quantitative comparison between with and without fine-tuning,we did not include it because monocular metric depth estimation in ureteroscopic images is highly domain-specific (as mentioned at the end of section 2.1). Models trained on general indoor or outdoor datasets will not perform well on this domain (as mentioned in the abstract). Additionally, due to the scarcity of publicly available datasets in this domain, we could not conduct evaluations on external datasets.

We appreciate your suggestion about clinical qualitative evaluation and will consider it in future work.

Once again, thank you for your time and insightful feedback. We hope our responses address your concerns.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top