Abstract

In the realm of robot-assisted minimally invasive surgery, dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes. Neural Radiance Fields (NeRF)-based methods have recently risen to prominence for their exceptional ability to reconstruct scenes but are hampered by slow inference speed, prolonged training, and inconsistent depth estimation. Some previous work utilizes ground truth depth for optimization but it is hard to acquire in the surgical domain. To overcome these obstacles, we present Endo-4DGS, a real-time endoscopic dynamic reconstruction approach that utilizes 3D Gaussian Splatting (GS) for 3D representation. Specifically, we propose lightweight MLPs to capture temporal dynamics with Gaussian deformation fields. To obtain a satisfactory Gaussian Initialization, we exploit a powerful depth estimation foundation model, Depth-Anything, to generate pseudo-depth maps as a geometry prior. We additionally propose confidence-guided learning to tackle the ill-pose problems in monocular depth estimation and enhance the depth-guided reconstruction with surface normal constraints and depth regularization. Our approach has been validated on two surgical datasets, where it can effectively render in real-time, compute efficiently, and reconstruct with remarkable accuracy. Our code is available at https://github.com/lastbasket/Endo-4DGS.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0223_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0223_supp.pdf

Link to the Code Repository

https://github.com/lastbasket/Endo-4DGS

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Hua_Endo4DGS_MICCAI2024,
        author = { Huang, Yiming and Cui, Beilei and Bai, Long and Guo, Ziqi and Xu, Mengya and Islam, Mobarakol and Ren, Hongliang},
        title = { { Endo-4DGS: Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper is described as building on an existing “foundation model” called “Depth-Anything”; since NeRF-based methods have recently risen to prominence for their exceptional ability to reconstruct scenes but are hampered by slow inference speed, prolonged training, and inconsistent depth estimation. “Lightweight MLPs capture temporal dynamics” yet full details of this are missing from the paper.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper continues on the line of work that shows how recent NeRF-based scene reconstruction techniques are being surpassed by even-more-recent 3D Gaussian splatting techniques. So much of the paper’s contributions seem to be inherited from the work of the “Depth-Anything” methodology although the application for endoscopic challenges puts it into the MICCAI regime.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Many of the shortcomings of section 2 were resolved by reading other papers that were referenced in the literature. It’s not clear how the overall single loss function can be motivated as a linear combination of separate loss functions. Especially when the lambda parameters for surf and conn make them almost insignificant compared to the others.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    If the authors of this article are the same as the authors of the “Depth-Anything” article, then this should be made clear so that those repos can be used for replication. Otherwise, details on how this was adapted for endoscopic reconstruction, and especially how multi-layer perceptrons can be used to acquire the temporal dynamics need to be revealed.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Scene reconstruction in endoscopic deformable workspaces is extremely ambitious and yet the article seems to inherit too much from “Depth-Anything” and the suggestion that multi-layer perceptrons can capture temporal dynamics. Researchers would need more extensive testing across a number of scenes to be convinced.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The work builds on existing tools available in repos, and claims SOTA performance. References should clearly establish these links



Review #2

  • Please describe the contribution of the paper

    This paper presents a 4D Gaussian Splatting method for dynamic reconstruction of endoscopic scenes. It utilizes the depth regularization from Deph-Anything, a new foundation model, to construct geometry prior and lightweight MLPs to capture the temporal Gaussian deformation field.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A new pipeline of 4DGS for endoscopic scene reconstruction
    2. The initialization of Gaussian splats uses foundation model
    3. The lightweight MLP for deformation field has shown effectiveness
    4. The ablation results of depth and surface constraints have shown their effectiveness
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The confidence map seems less effective, it would be better to add corresponding explanation and discussion.
    2. What is the underlying logic of using Depth-Anything-small? Could you please explain the motivation of the experimental choice?
    3. How does the design adapt to the endoscopic data? It would be better explain the special design of the pipeline for endoscopic scene.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Typo of W: With the point cloud from depth prior, We intialize µ … on Page4.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea technically sounds and its implementation achieves the SOTA performance.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper present Endo-4DGS, a real-time endoscopic dynamic reconstruction approach that utilizes 3D Gaussian Splatting (GS) for 3D representation, to overcome the obstacles of slow inference speed, prolonged training and inconsistent depth estimation of NeRF.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper is well written.
    2. The clinical problem solved by this paper is significant;
    3. The experiment results is sufficient to prove its SOTA performance.
    4. The ablation study is solid.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    NA

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Maybe it is benefical to mention the concurrent works.

    [1] Zhu, Lingting, et al. “Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting.” arXiv preprint arXiv:2401.11535 (2024). [2] Liu, Yifan, et al. “EndoGaussian: Gaussian Splatting for Deformable Surgical Scene Reconstruction.” arXiv preprint arXiv:2401.12561 (2024).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper has novelty, clinical significance, and good experiment results of this work.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The paper is good and the rebuttal is valid and strong. To me it is the best paper in my rebuttal stack.




Author Feedback

We thank all reviewers for their valuable feedback. We appreciate that the reviewers find our work to be “technically sounds”(R3), and “has novelty, clinical significance”(R4). Both (R3, R4) agree that our method is effective in challenging surgical scenarios and achieves SOTA performance. We address all concerns and unclear details as follows.

  1. Details of Lightweight MLPs are missing. (R1) The effectiveness of MLPs has been demonstrated in previous works (Cao et al., Hexplane, CVPR 2023). Similarly, we utilize Hexplane (without learnable weights) for temporal dynamics factorization and multiple MLPs for capturing deformation (Section 2.2). We have proved the effectiveness of the Hexplane + MLPs with SOTA results on dynamic scenes. We will add more details in the manuscript.

  2. Details and inheritance of Depth-Anything. (R1, R3) Adopting Depth-Anything is a small contribution where our novelty lies in 4D Gaussian Splatting for monocular endoscopes. For monocular surgical reconstruction, depth prior is vital in our method instead of Depth-Anything. We utilize Depth-Anything to generate depth prior instead of inheritance or innovation, other pre-trained models are also compatible with our pipeline. Additionally, other works also adopt pre-trained depth (Wang et al., SparseNeRF, ICCV 2023). Regarding the choice of model, we consider accuracy, memory usage, and computation efficiency, and choose DA-small as our depth foundation.

  3. Shortcomings of section 2 were resolved by other papers. (R1) Although there are previous works in different domains/tasks, we are the first to adopt monocular depth estimation in endoscopic reconstruction. Our method focuses on challenges like slow inference speed, prolonged training, and inconsistent depth estimation. Additionally, we propose novel techniques to combine depth prior with Gaussian Splatting and achieve SOTA performance.

  4. How the overall loss function can be motivated? (R1, R3) Our final loss is a linear combination of components from different optimization aspects. We define the confidence loss by regarding dynamic areas as inherent noise since its supervisions are inherently inconsistent (different time step has different geometries while the confidence is time-agnostic). The dynamic areas have lower confidence and we penalize them by the confidence loss for detail refinement. We also notice that geometry constraints are significant. (Cheng et al., GaussianPro, ICML 2024). Therefore, we propose depth and normal regularization with novel definitions. We will clarify these in the update.

  5. Lambda parameters for surf and conn make them insignificant. (R1) Depth, normal, and confidence are regularization losses instead of dominant l1 loss for RGB. Therefore their lambda can not be too large to affect the main optimization. The weight for regularization terms is also relatively small (0.0001) but effective in the other works (Wu et al., 4DGaussians. CVPR 2024). We have also shown the effectiveness of the regularization with small lambda in our ablation.

  6. Reproducibility. (R1) Details of code can be found in the anonymized link in the abstract. We will release full details to the public.

  7. More extensive testing. (R1) Since it is hard to access the in-vivo surgical data, we follow the previous works (Wang et al., EndoNeRF; Yang C et al., Lerplane. MICCAI) and test on 3 scenes with 38 different time stamps. There exists a large deformation of tissues between each evaluation time stamp, and our method performs robustly against the deformation. We will add a more extensive evaluation for all time stamps.

  8. How does the design adapt to the endoscopic data? (R3) While it is hard to retrieve depth from the monocular endoscope, our contributed method can achieve robust real-time rendering for surgical scene reconstruction.

We will also fix the minor problems for better clarity. All the suggestions (references) will surely be considered and added to the final manuscript. (R3, R4)




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper presents a 4D Gaussian Splatting method for dynamic reconstruction of endoscopic scenes. It utilizes the depth regularization from Depth-Anything, a new foundation model, to construct geometry prior and lightweight MLPs to capture the temporal Gaussian deformation field. After careful consideration of the authors’ rebuttal, all reviewers consistently lean towards acceptance of this paper. The authors have adequately addressed the major concerns and questions raised by the reviewers regarding the significance of the proposed methods and some of the technical aspects. I agree with the authors that the main innovation is a Gaussian Splatting model to represent 3D endoscopic scenes. The authors are strongly encouraged to incorporate all reviewers’ questions and suggestions into a revised version of the paper for its final publication, however please refrain from generating new results – as reviewers should not have requested new experiments as part of the rebuttal. My decision leans towards accepting the paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper presents a 4D Gaussian Splatting method for dynamic reconstruction of endoscopic scenes. It utilizes the depth regularization from Depth-Anything, a new foundation model, to construct geometry prior and lightweight MLPs to capture the temporal Gaussian deformation field. After careful consideration of the authors’ rebuttal, all reviewers consistently lean towards acceptance of this paper. The authors have adequately addressed the major concerns and questions raised by the reviewers regarding the significance of the proposed methods and some of the technical aspects. I agree with the authors that the main innovation is a Gaussian Splatting model to represent 3D endoscopic scenes. The authors are strongly encouraged to incorporate all reviewers’ questions and suggestions into a revised version of the paper for its final publication, however please refrain from generating new results – as reviewers should not have requested new experiments as part of the rebuttal. My decision leans towards accepting the paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal addresses the primary concerns raised by the reviewers, particularly those related to the contributions of the paper in the context of the “depth anything” method. The authors have provided sufficient explanations and clarifications that enhance the understanding and significance of their work.

    In regard to point 7 of the rebuttal, which suggests adding a more extensive evaluation for all time stamps, the authors should refrain from considering this addition as new results could not be assessed by reviewers.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The rebuttal addresses the primary concerns raised by the reviewers, particularly those related to the contributions of the paper in the context of the “depth anything” method. The authors have provided sufficient explanations and clarifications that enhance the understanding and significance of their work.

    In regard to point 7 of the rebuttal, which suggests adding a more extensive evaluation for all time stamps, the authors should refrain from considering this addition as new results could not be assessed by reviewers.



back to top