Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Depth information is essential for 3D reconstruction in surgical scenes. Depth-pose-based self-supervised monocular depth estimation has advanced significantly but faces two challenges in laparoscopic scenes, leading to unreliable pixel matching during training. This also results in depth maps failing to preserve geometric structure when back-projected into 3D space. Second, limited movement space causes laparoscopic motion to involve pure complex rotations. It further complicates the relative pose estimation between adjacent views. To address these issues, we propose a novel self-supervised monocular depth estimation method guided by geometric constraints. We incorporate surface normal estimation with depth-normal consistency to establish a geometric constraint for predicted depth maps. Furthermore, we propose an uncertainty measure based on the distance from 3D points to a synthesized plane, reducing conversion bias from depth to normals. Moreover, we optimize pose estimation using a feature-matching process with a 4D score volume. Our method reduced absolute relative error by 19.0% and 3D completeness by 23.9% over the baseline. Our code is available.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1214_paper.pdf

SharedIt Link: https://rdcu.be/eHw1x

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05114-1_23

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/MoriLabNU/GSPDepthL

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiWen_Enforcing_MICCAI2025,
        author = { Li, Wenda AND Hayashi, Yuichiro AND Oda, Masahiro AND Kitasaka, Takayuki AND Misawa, Kazunari AND Mori, Kensaku},
        title = { { Enforcing Geometric Constraints of Surface Normal and Pose for Self-supervised Monocular Depth Estimation on Laparoscopic Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15968},
        month = {September},
        page = {235 -- 245}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper addresses two challenges in laparoscopic self-supervised monocular depth estimation: geometric structure loss and complex pose estimation. It introduces depth-normal consistency to enforce geometric constraints via surface normal estimation, mitigating texture homogeneity issues. A distance-based uncertainty map reduces bias in depth-to-normal conversion, improving robustness at non-planar regions. Additionally, a 4D score volume enhances pose estimation by leveraging spatial feature matching, aiding pure rotational motion handling. These solutions reduce absolute relative error by 19.0% and improve 3D completeness by 23.9%, advancing geometric accuracy and pose estimation for laparoscopic scenes.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper’s major strengths lie in its geometric constraints and pose optimization. It introduces depth-normal consistency with a distance-based uncertainty map, uniquely addressing texture homogeneity through the enforcement of 3D geometric structures via surface normals. This approach dynamically weights errors in non-planar regions. The proposed 4D score volume for pose estimation leverages spatial feature matching to handle pure rotations, a critical challenge in laparoscopic scenarios. Experimental results demonstrate up to 19% reduction in absolute error and 23.9% improvement in 3D completeness, with ablation studies validating the impact of each component.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The 4D score volume introduces high computational complexity, yet the paper lacks real-time performance analysis, critical for clinical use.
2. Vulnerability of the local planar assumption. The depth-normal conversion relies on the 8-neighborhood coplanarity assumption, but the paper does not explicitly discuss scenarios where this assumption fails (e.g., highly folded organs or overlapping objects). This limitation is only partially mitigated by the uncertainty map, potentially raising doubts about estimation accuracy in complex structural regions.
3. In the ablation studies, the experimental effects of distance uncertainty and normal loss were not individually validated.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

High computational complexity without real-time analysis (vital for clinical use), unaddressed limitations of the local planar assumption in complex structures, and incomplete ablation validation of key components. While the geometric constraints and pose optimization show promise, these weaknesses undermine the method’s robustness and translational potential.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have addressed the concerns I raised. Based on the responses to my questions, I recommend acceptance.

Review #2

Please describe the contribution of the paper

This paper presents a method for self-supervised training of a monocular depth estimation model on laparoscopic images. The main components of the proposed approach include a depth estimation module, a normal estimation module, and a camera pose estimation module. Based on the predictions from these three modules, a series of self-supervised, geometry-based loss terms are formulated to enhance the geometric properties of the estimated depth map. These include photometric consistency loss, normal consistency loss, and depth-normal consistency loss. Experiments conducted on the SCARED and Hamlyn datasets demonstrate promising performance for monocular depth estimation on laparoscopic images.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is well-structured and clearly written, making it easy to follow. All technical details are presented in a comprehensive and accessible manner.
- The proposed method achieves state-of-the-art performance on both datasets, demonstrating its effectiveness.
- The self-supervised framework is versatile and performs well across different depth estimation backbones. When integrated with two different backbones, the method consistently shows performance improvements, highlighting its robustness and generalizability.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The proposed method considers pixel uncertainty only in the computation of the depth-normal consistency loss. However, similar issues might arise in the computation of the photometric loss. This is because the process of warping the source image to obtain the target image can be affected by parallax and scene occlusions, which may result in low-confidence regions in the warped image. I am curious whether this would impact the performance of the depth estimation model. Would it be necessary to account for photometric uncertainty when computing the photometric loss?
- As the third contribution of the paper, the improvement to the pose estimation model may require relevant supportive evidence, such as an evaluation of pose accuracy.
- Are there any limitations to the proposed method and model? A discussion on this topic would further enhance the quality of the paper.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall, the method proposed in this paper is well-designed, and the experimental results demonstrate its performance and robustness. Based on this, I am inclined to give a ‘weak accept’ at the pre-rebuttal stage.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The main contribution of the paper is a self-supervised method for monocular depth estimation in laparoscopic images that integrates geometric constraints through normal-normal and depth-normal consistency and improves pose estimation using a 4D feature-matching process. This approach improves depth accuracy and 3D completeness compared to other methods.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper introduces a distance-based uncertainty map to handle inaccuracies in the depth-to-normal conversion. I believe this idea has not been explored before and achieves a remarkable RMSE reduction by modeling uncertainty based on the deviation of 3D points from a local plane. Although no alternative depth-to-normal methods are presented for comparison. Additionally, the paper enhances pose estimation using a 4D score volume for feature matching between frames This is shown as a substantial improvement in the ablation. However, it is unclear if this method was inspired by a previous work or if the approach is claimed as a novelty of the presented work only.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Using depth-normal consistency for MDE is even older than Monodepth2, as seen in [A]. In addition, other works in endoscopy have used normal-based losses for MDE [B] and for 3D reconstruction [C, D, E, F]. The introduction does not clearly position this contribution in the context of the state-of-the-art. In addition, the inclusion of metrics such as MAE, MedAE in Table 1 would make it easier to compare the proposed approach with other methods not included.

In addition, the method section is slightly difficult to understand on a first read. If possible, simplifying, clarifying or unifying the symbols and sub-index notation might help the reader.

[A] Zhenheng Yang, Peng Wang, Wei Xu, Liang Zhao, and Ramakant Nevatia. Unsupervised learning of geometry with edge-aware depth-normal consistency. In AAAI, 2018.

[B] Rodríguez-Puigvert, Javier, et al. “LightDepth: Single-View Depth Self-Supervision from Illumination Decline.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.

[C] Wang, S., Zhang, Y., McGill, S.K., Rosenman, J.G., Frahm, J.-M., Sengupta, S., Pizer, S.M.: A surface-normal based neural framework for colonoscopy reconstruction. In: International Conference on Information Processing in Medical Imaging, pp. 797–809 (2023). Springer.

[D] Bonilla, Sierra, et al. “Gaussian Pancakes: geometrically-regularized 3D gaussian splatting for realistic endoscopic reconstruction.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024.

[E] Wang, Peng, et al. “Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction.” arXiv preprint arXiv:2106.10689 (2021).

[F] Batlle, Víctor M., et al. “LightNeus: Neural surface reconstruction in endoscopy using illumination decline.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

There is a writing error in the fourth sentence of Section 4, Discussion and Conclusion: “As show in Table 1, …” is repeated at the end.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed components (depth-normal consistency with distance-based uncertainty and the 4D score volume for pose estimation) are promising. However, it would be helpful to discuss the use of depth-normal consistency in MDE and 3D reconstruction both within and outside of endoscopy. Please also provide MAE and MedAE for your system for better comparison with current and future competing methods.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The manuscript states: “Our main contributions are summarized as follows. (i) We introduce surface normal estimation and build the depth-normal consistency to guide monocular depth estimation and provide geometric constraints.” Thus, the depth-normal loss is presented as the first contribution of this work, without any prior discussion of similar works using this approach.

The method improves upon the baselines in Table I in the context of laparoscopy. However, considering its limitations, it might be difficult to bring this into clinical practice when surfaces undergo strong deformations or when the local planarity assumption does not hold.

Author Feedback

We appreciate the meta-reviewer and the reviewers for their constructive feedback. We address major concerns raised by reviewers.

Reviewer 1: Q1: The method only considers pixel uncertainty in the depth-normal consistency loss. Should photometric uncertainty also be considered in photometric loss to handle the effects of parallax and occlusions? A1: This could be another way to improve MDE, but it is beyond the scope of this study. We will consider it in future work. Q2: As the third contribution of the paper, the improvement to the pose estimation model may require relevant supportive evidence, such as an evaluation of pose accuracy. A2: This study focuses on MDE and 3D reconstruction, with corresponding results provided. Table 2 (ID 1 vs. 5) shows that improving the pose estimation model benefits both tasks. Q3: Are there any limitations to the proposed method and model? A3: This method mainly focuses on the smooth organ surface. The deformable issues and highly folded organs are not considered. We will address these in future work.

Reviewer 2: Q1: Using depth-normal consistency for MDE is even older than Monodepth2, as seen in [A]. In addition, other works in endoscopy have used normal-based losses for MDE [B] and for 3D reconstruction [C, D, E, F]. The introduction does not clearly position this contribution in the context of state-of-the-art. In addition, the inclusion of metrics such as MAE, MedAE in Table 1 would make it easier to compare the proposed approach with other methods not included. A1: First, this paper proposed a novel depth-normal consistency for MDE and 3D reconstruction. [A] and [B] are not considering 3D reconstruction. [C] [D], [E], and [F] are relied on ground-truth. In contrast, our approach is fully self-supervised without ground-truth and enhances MDE and 3D reconstruction. Second, we adopt widely used metrics consistent with GCDepthL, and EndoDAC. But we will consider adding more metrics in future work.

Reviewer 3: Q1: The 4D score volume introduces high computational complexity, yet the paper lacks real-time performance analysis, critical for clinical use. A1: The 4D score volume is only utilized for pose estimation during training. During inference, we only use the depth network to predict depths, without pose estimation. Our depth estimation network employs the same backbone (ResNet-18) as some baseline methods in Table 1 (e.g., Monodepth2, GCDepthL), ensuring the same real-time performance. Q2: Vulnerability of the local planar assumption. The depth-normal conversion relies on the 8-neighborhood coplanarity assumption, but the paper does not explicitly discuss scenarios where this assumption fails (e.g., highly folded organs or overlapping objects). This limitation is only partially mitigated by the uncertainty map, potentially raising doubts about estimation accuracy in complex structural regions. A2: As mentioned in Introduction, this paper focuses on the issue of similar texturess caused by smooth organ surfaces. Since most regions in laparoscopic images are relative smooth, the 8-neighborhood coplanarity assumption is generally valid. And we introduce uncertainty to guide the model’s attention toward smooth regions during training. The contributions of assumption-based depth-normal consistency and uncertainty are shown in Table 2 (ID 2, 3, 4). Further efforts to handle highly folded organs or overlapping objects will be explored in future work. Q3: In the ablation studies, the experimental effects of distance uncertainty and normal loss were not individually validated. A3: Distance uncertainty is an integral component of depth-normal consistency loss (Eq. 8) with its contribution validated in Table 2 (ID 2 vs. 3). Normal loss is a constraint applied to the estimated surface normal from adjacent images, and it is designed to work in conjunction with Eq. 8 (including distance uncertainty). Therefore, the contribution of normal loss on the top of Eq.8 is validated in Table 2 (ID 3 vs. 4).

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

I have read the manuscript, review comments, rebuttal letter. All reviewers recommend acceptance (after rebuttal). This meta reviewer believes that the authors did a good job in addressing concerns.

back to top

Enforcing Geometric Constraints of Surface Normal and Pose for Self-supervised Monocular Depth Estimation on Laparoscopic Images

Author(s):