Abstract

Real-time 3D reconstruction of surgical scenes plays a vital role in computer-assisted surgery, holding a promise to enhance surgeons’ visibility. Recent advancements in 3D Gaussian Splatting (3DGS) have shown great potential for real-time novel view synthesis of general scenes, which relies on accurate poses and point clouds generated by Structure-from-Motion (SfM) for initialization. However, 3DGS with SfM fails to recover accurate camera poses and geometry in surgical scenes due to the challenges of minimal textures and photometric inconsistencies. To tackle this problem, in this paper, we propose the first SfM-free 3DGS-based method for surgical scene reconstruction by jointly optimizing the camera poses and scene representation. Based on the video continuity, the key of our method is to exploit the immediate optical flow priors to guide the projection flow derived from 3D Gaussians. Unlike most previous methods relying on photometric loss only, we formulate the pose estimation problem as minimizing the flow loss between the projection flow and optical flow. A consistency check is further introduced to filter the flow outliers by detecting the rigid and reliable points that satisfy the epipolar geometry. During 3D Gaussian optimization, we randomly sample frames to optimize the scene representations to grow the 3D Gaussian progressively. Experiments on the SCARED dataset demonstrate our superior performance than existing methods in novel view synthesis and pose estimation with high efficiency.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1818_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1818_supp.zip

Link to the Code Repository

https://github.com/wrld/Free-SurGS

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Guo_FreeSurGS_MICCAI2024,
        author = { Guo, Jiaxin and Wang, Jiangliu and Kang, Di and Dong, Wenzhen and Wang, Wenting and Liu, Yun-hui},
        title = { { Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15007},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This method introduces a way to perform rigid gaussian splatting without requiring SfM for pose estimation. They introduce this since SfM can fail in non-rigid scenes and under different lighting. They estimate pose by first filtering out the rigid parts and visible parts of the environment. They then align optical flow with the estimated flow of the current 3DGS (3D gaussian splatting) scene to optimize pose. They demonstrate results on the rigid SCARED dataset in terms of pose accuracy and photometric metrics (PSNR,etc.).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Strengths:

    • This paper begins to address a key need to enable Gaussian Splatting in endoscopy via learning robust ways to mask points to then match for pose estimation.

    • This paper provides multiple different analyses, in addition to providing a method for reconstruction much more efficient than those it compares to.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Weaknesses:

    • The method is real-time for rendering, but not for reconstruction. Please be careful with the use of real-time. The method is not reconstructing the scene in real-time. Rather it reconstructs the scene offline and renders it in real-time: “In this paper, we address the challenges and present Free-SurGS for real-time multi-view surgical scene reconstruction from monocular inputs, realizing joint optimization for both 3D Gaussians and camera poses.”

    • The method does not evaluate depth reconstruction accuracy on SCARED. This is what should be a focus when proposing a reconstruction method rather than just visual reconstruction.

    • Details for optical flow calculation are not provided

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Some edits to improve readability:

    • Figure 2 typos: Differental->differentiable.

    • In the quoted sentence, I was not able to understand for how many iterations the 3DGS optimization step is performed. Is it only 30 steps for each, or 30 steps for pose, and something else for the gaussians: “We set the iteration for optimizing pose and 3DGS as 30 for every input frame.” (“Submission 1818.pdf”, p. 7) (pdf)

    • Explain the abbreviation “Con.” in the Table 2 caption.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I like the paper and the idea the authors present to optimize pose using rigid and reliable points. I believe the use of 3DGS for reconstruction is very promising. The main sticking point for me is that there is no detail provided on how optical flow is calculated. Do we have ground truth maps? Or is it calculated using something like RAFT? A secondary point that detracts slightly is the lack of the use of the SCARED data for evaluating depth reconstruction accuracy.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I am changing my recommendation from weak reject to accept. I would like to thank the authors for their honest and clear response; this is much appreciated. Their method is very useful, and helps remove the need for SfM in these types of scenes.

    Their response includes clarifying the usage for semi-static scenes, the suitability of the method for pose instead of depth, detailing the usage of RAFT for optical flow, and the clarification of inference being online and reconstruction offline.

    Rebuttal stack ranking: 2/4



Review #2

  • Please describe the contribution of the paper

    The paper proposes a SfM-free 3D Gaussian Splatting (3DGS) method for real-time surgical scene reconstruction. By leveraging optical flow priors and joint optimization of camera poses and scene representation, it overcomes the limitations of SfM-based approaches, providing accurate pose estimation and realistic rendering in challenging surgical environments.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Free-SurGS introduces a SfM-free approach to 3DGS, it uses Depth-Anything to obtain the image depth and reconstruct 3D point cloud for 3D initialization. At the same time, this method simutaneously optimize 3D GS and camera poses by utilizing optical flow priors and joint optimization techniques. Experiments on the SCARED dataset showcase the performance of the propsed method in both novel view synthesis and pose estimation. The paper provides a thorough evaluation of its proposed method, comparing it with existing SfM-free methods and demonstrating its efficiency and accuracy in surgical scene reconstruction.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) Limited Validation Dataset. (2) The proposed method does not address tissue deformation. While a filter is used to remove outliers, the SCARED dataset does not exhibit significant deformation.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It’s essential for the paper’s reproducibility that the authors provide access to both the code and specific video segments used in the experiments.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    While the evaluation primarily relies on the SCARED dataset, incorporating additional publicly available datasets would bolster the robustness and generalizability of the findings. Furthermore, it would be beneficial to investigate how the proposed method performs under conditions involving significant camera movements and larger scenes, as these scenarios may pose challenges that were not adequately addressed in the current evaluation.

    Tissue Deformation Handling: Although the paper addresses outlier removal through a filtering mechanism, it does not explicitly account for tissue deformation. Given the importance of accurately representing tissue dynamics in surgical scene reconstruction, future iterations of the method could benefit from incorporating techniques specifically designed to handle deformation, thereby enhancing the realism and applicability of the simulations.

    Consider discussing the limitations of Free-SurGS, such as its robustness of pose estimation and the deformation problem.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of at the same time optimizing 3D GS and camera poses by minimizing the photometric loss, the optical flow loss and depth loss.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    Thanks very much for the authors’ response to my concerns.

    Regarding the tissue deformation problem, the authors argued that they mainly focus on SfM-free Gaussian Splatting for static/semi-static scenes. From my point of view, the SfM-free strategy is a rather obvious idea. The computation of the optical flow loss is probably the most original part of the paper, but it is also not fuly evaluated. Additionally, the authors said that the proposed method is promising for handling dynamic environments but may fail in severe deformation. A video showing small levels of deformations would be great.

    In addition, I agree with the reviewer’s comment that the proposed method is real-time for rendering but not for reconstruction. The authors misuse the terms “real-time reconstruction” and “real-time rendering”. What is actually presented in the paper is the rendering result rather than the reconstruction result. However, in the authors’ response of real-time clarification, it seems they do not fully grasp the related concern raised by Reviewer 4. Given that the proposed method also focuses on scene reconstruction, it is essential to not only mention photometric accuracy but also geometric accuracy to demonstrate its performance. While obtaining ground truth for 3D scenes in many in vivo cases may be challenging, publicly available datasets like SCARED provide ground truth, which could be useful for evaluating geometric accuracy.



Review #3

  • Please describe the contribution of the paper

    The paper presents a Structure from Motion-free Gaussian splatting (GS) model that employs optical flow priors to co-optimize the scene and camera pose. The model formulates the pose estimation problem as the optimization of the projection flow generated by the 3D GS and the optical flow between images in the video sequence. GS and poses are optimized iteratively, fixing each component correspondingly.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The work aims to replace the SfM stage of the GS models, which contributes to faster training times.

    The model is evaluated against four different SfM-free Nerf-based models showing improved performance and optimized training times.

    Both the novel view synthesis and the pose estimation capabilities are evaluated.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    A potential limitation is the use of a single dataset during the evaluation.

    It is mentioned that the method employs a depth estimator (depth anything) for the initialization. Considering the differences between real-world images and medical images, foundation models like depth anything can present inconsistencies in the predictions. What is the sensitivity of the model to errors in the predicted depth during the initialization step?

    How is the optical flow (Ot-1->t) obtained?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper presents an SfM-free approach for reconstruction in surgical scenarios. The use of flow-based pose optimization aims to overcome the limitations of traditional photometric loss in surgical environments. One weakness that can limit the contribution is the use of a single dataset, which makes it unclear how well the method can generalize to different surgical scenarios.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents an SfM-free approach for training Gaussian splattings in surgical scenes; some limitations can be in the evaluation since a single dataset might not be enough to understand how well the model can generalize to different surgical domains.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Most of the concerns were clarified during the rebuttal.




Author Feedback

We thank all the reviewers (R1,R3,R4) for their constructive comments. We address the major concerns here and will carefully incorporate all suggestions into the revised manuscript. Q1(R1,R3,R4) Reproducibility: We will release our code and video segments upon acceptance. Q2(R1,R3) Limited validation dataset: Due to page limit, we only choose SCARED Dataset to evaluate our method for two reasons. 1)It consists of nine different surgical scenes with challenging conditions, e.g. textureless surface, illumination changes, large camera translation, motion blur. 2)It contains GT camera trajectory for pose evaluation. Its results demonstrate that our method could handle these challenges and generalize well on different surgical scenes. We will validate on more surgical datasets in future work. Q3(R1) Tissue deformation: We agree that tissue deformation is an important problem in surgical reconstruction. Our proposed consistency check can locate and remove the outliers to maintain rigid and reliable points in the optical flow for pose estimation (P5, Sec 2.3), which is promising for handling dynamic environments but may fail in severe deformation. In this paper, we mainly focus on SfM-free GS for static/semi-static scene. We will discuss the limitations in final version and extend to deal with tissue deformation in future work. Q4(R3) Sensitivity to depth error: Before submission, we have conducted experiments to validate different depth models (DPT, Depth Anything, MiDaS) and choose Depth Anything which behaves the best perfomance for initialization. While there may be depth errors due to domain gap, the initialization stage will not affect the subsequent optimization too much for two reasons. 1)The adaptive density control helps prune outlier 3D Gaussians. 2)The scale-invariant depth loss helps avoid inconsistent depth during 3DGS optimization. Q5(R3,R4) Detailed optical flow calculation: We use the pretrained RAFT model to compute the optical flow between frames, which is used as pseudo-GT to constrain the projection flow from 3DGS. Specifically, O_t-1->t is obtained by computing the optical flow between frames I_t-1 and I_t (R3). To deal with the transient objects and photometric inconsistencies, we employ the consistency check to obtain the flow mask. This could help identify and preserve correspondences that are rigid and reliable for accurate matching. After filtering out outliers in optical flow, we will perform flow-based pose estimation by minimizing the flow loss (P6, Sec 2.3). We will add more details of optical flow in final version. Q6(R4) Real-time clarification: Yes, only the inference is real-time while the reconstruction process is offline. We will carefully update all related statements. Q7(R4) Evaluation of depth reconstruction: We agree that depth evaluation is important for geometry-aware reconstruction. Before submission, we have conducted experiments to validate the performance of the depth estimation. However, the results show that the depth performance only achieves slight improvement compared to Nope-NeRF. 3DGS is popular mainly due to its efficiency and rendering quality, but is not very good at modeling the underlying geometry yet due to its multi-view inconsistent nature. In this work, we mainly focus on how to remove the reliance of SfM in 3DGS, while still achieving robust and accurate pose estimation and visual reconstruction. In future work, we will enhance our method to achieve geometrically accurate surface reconstruction. Q8(R4) Iterations of 3DGS optimization step: During progressive growing (Fig. 2), we set 30 iterations for both pose estimation and 3DGS optimization. After estimating the poses for all the frames (fixed thereafter), we randomly sample frames to optimize the 3DGS following the original implementation. We will clarify this in final version. Q9(R4) Meaning of Con: “Con.” refers to the consistency check to maintain rigid and reliable points. We will add the description in Table. II.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top