Abstract

Dynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-time rendering. In addition, restricted single view perception and occluded instruments also propose special challenges in surgical scene reconstruction. To address these issues, we develop SurgicalGaussian, a deformable 3D Gaussian Splatting method to model dynamic surgical scenes. Our approach models the spatio-temporal features of soft tissues at each time stamp via a forward-mapping deformation MLP and regularization to constrain local 3D Gaussians to comply with consistent movement. With the depth initialization strategy and tool mask-guided training, our method can remove surgical instruments and reconstruct high-fidelity surgical scenes. Through experiments on various surgical videos, our network outperforms existing method on many aspects, including rendering quality, rendering speed and GPU usage. The project page can be found at https://surgicalgaussian.github.io.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3077_paper.pdf

SharedIt Link: https://rdcu.be/dV5yy

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72089-5_58

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3077_supp.pdf

Link to the Code Repository

https://surgicalgaussian.github.io/

Link to the Dataset(s)

https://github.com/SurgicalGaussian/SurgicalGaussian

BibTex

@InProceedings{Xie_SurgicalGaussian_MICCAI2024,
        author = { Xie, Weixing and Yao, Junfeng and Cao, Xianpeng and Lin, Qiqin and Tang, Zerui and Dong, Xiao and Guo, Xiaohu},
        title = { { SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {617 -- 627}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper uses 3D GS as the map representation of surgical scene and uses a forward-mapping deformation MLP to model the deformation field of 3D Gaussians that the offset of Gaussian position, scaling and rotation properties can be obtained. Meanwhile, an regularization term (nearby Gaussians have similar position and covariance) is used to constrain local 3D Gaussians to comply with consistent movement.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Introducing a deformable 3D Gaussian Splatting method, which models spatio-temporal features of soft tissues and enables reconstruction of surgical scenes. (2) Good experimental comparisons of Surgical- Gaussian and other reconstruction methods on two datasets, detail compared result are presented.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The GS initialization method is quite common lately and doesn’t differ much from other state-of-the-art GS methods. It might be beneficial to explore ways to enhance its uniqueness and effectiveness.
    2. The mathematical illustrations aren’t very clear.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    it is essential that the authors provide access to both the code and the specific video segments used in the experiments.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) Please provide more detailed explanations of key technical components, such as the deformation MLP and regularization techniques. This will help readers better understand the intricacies of the proposed method and its implementation.

    (2) While the Gaussian Splatting (GS) initialization method proposed in the paper is effective, it’s essential to acknowledge that similar approaches have been reported in recent literature. Consider discussing how the proposed initialization method compares with state-of-the-art (SOTA) GS methods such as EndoGaussian and COLMAP-Free 3D Gaussian Splatting, highlighting any novel aspects or improvements.

    (3) The clarity of mathematical illustrations need be improved. For example, the color regularization loss need to be in kind of detail.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Due to the inclusion of numerous modules and the lack of detail in certain technical aspects, it is challenging to ascertain the reproducibility of this paper. Additionally, the proposed method is trained and tested on highly specific datasets, making it difficult to assess the generalizability of the methods.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This research work introduces a novel framework called SurgicalGaussian tailored for reconstructing dynamic surgical scenes from endoscopic videos. The key contributions include the development of a deformable 3D Gaussian Splatting method to model soft tissue dynamics, a depth initialization strategy to reduce motion-appearance ambiguity, and regularization techniques for color prediction and noise reduction in Gaussian deformation fields. The method outperforms existing techniques in terms of rendering quality, speed, and GPU usage, showcasing its potential for high-fidelity surgical scene reconstruction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The SurgicalGaussian method introduces a novel approach to surgical scene reconstruction using deformable 3D Gaussian spraying. This method allows for better capturing of intricate details in the surgical scene and enables real-time rendering.

    Surgical scene reconstruction faces challenges such as sparse viewpoints, limited movement space, topologically changing tissues, and instrument occlusion. The SurgicalGaussian method overcomes these challenges by utilizing deformable 3D Gaussian spraying, resulting in more accurate reconstruction of the surgical scene.

    Compared to existing methods, the SurgicalGaussian method demonstrates superior reconstruction quality, reconstruction speed, and efficient GPU utilization. Experimental results show that this method preserves more details in the reconstructed surgical scene and achieves faster rendering.

    The SurgicalGaussian method is not just a theoretical approach but also demonstrates clinical feasibility. It can assist doctors in operating instruments more accurately and can be applied in surgical environment simulation, robotic surgery automation, medical teaching, and other related fields.

    In summary, the SurgicalGaussian method has several strengths in surgical scene reconstruction, including its novel formulation, ability to address challenges, superiority over existing methods, and clinical feasibility. These strengths make it a valuable contribution to the field of robot-assisted surgery and intelligent medical care.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The author does not explain clearly how to detect the surgical tools and how to stably segment masks between adjacent video frames.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    To make the research more valuable and convincing, I’d like to suggest that the authors add a professional user study to get feedback from medical experts.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper carries out a comprehensive series of experiments to compare their method with leading-edge techniques. The results demonstrate that their approach substantially surpasses current methods in terms of reconstruction quality.

    However, its evaluation is not comprehensive and convincing. Besides, the computational efficiency of the proposed SurgicalGaussian is not excellent (lower than EndoGaussian), which may limit the application of SurgicalGaussian.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    Gaussian Splatting is applied to the deformable reconstruction of a color from endoscopic image data. First, a point cloud is constructed from the (posed?) depth maps, where tool masks are used to cull tools. The point cloud is used to initialize the Gaussians. Deformation is applied by an MLP. Gaussian splat parameters and the deformation field are optimized using the endoscopic images and depth maps. The approach is evaluated and shows superior results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Gaussian splatting is a very recent and promising algorithm for 3D reconstruction and novel view synthesis, and it is a rather obvious idea to apply this to the problem of colon reconstruction from endoscopic data. Considering deformations by moving Gaussians has also been considered before. Interesting is the way how tools are blended out, and holes filled with data from other frames, however this part is somewhat unclear w.r.t. deformation. So while the idea is straightforward, it is novel and interesting, and this paper gives us valuable hints how practical the idea is.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The input should be better described: do we know the poses of the images? Where does depth come from, is it reliable? There do the masks come from, are they reliable?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    -

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper is generally well written and easy to understand. The video is impressive!

    • Explain better the input (depth, masks, poses), where the data comes from in a clinical setting, and how reliable it is.
    • The deformation regularization seems to be quite simplistic, but it seems to work well. It would be great to see the reconstructed surface deformation, without RGB on top, to be able to judge the deformation. Could other deformation models be integrated, for instance models that allow the used to define some stiffness of the material or similar?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea to apply GS to the problem of color reconstruction is straightforward, but novel. There are no very original algorithmic contributions, except maybe the tool masking. Nevertheless, the results are interesting, and the video is impressive. The reconstructed deformation should be evaluated. The paper is solid, but not ground breaking, but I recommend acceptance, and see no further need for a rebuttal.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We appreciate the reviewers’ valuable feedback and constructive suggestions. Below we address the major reviewers’ concerns.

R1、R3、R4. Reproducibility: We will release our code and datasets once the paper is accepted.

R4. Compare initialization with EndoGaussian and COLMAP-Free 3D Gaussian Splatting: EndoGaussian, COLMAP-Free 3D Gaussian Splatting and our method all use depth-projected point cloud as Gaussian initialization, but EndoGaussian and our method need to address the holes in point cloud caused by the removal of surgical tools. EndoGaussian randomly selects 1% of the point cloud from each frame and combines them together to obtain the point cloud used for Gaussian initialization. In our experiments, we found that taking point cloud with large missing regions of first frame as Gaussian initialization would have an impact on the reconstruction quality. As surgical instruments move, tissue that is occluded in the current frame can be observed on other frames. Inspired by this observation, we inpaint the holes on image and depth of the first frame with new showing content in subsequent frames, and then project inpainted image to get completed point cloud for Gaussian initialization. Therefore, our point cloud reduces the size of holes and it requires less computation.

R4. Provide more explanations of key technical components: Gaussian initialization is one of the novelties of our method. Besides, our method proposes deformation MLP with regularization constraints (L_pos, L_cov) to predict the change of Gaussian properties over time. We also apply the occlusion-based color regularization loss (L_smooth) to help remove surgical tools. As described in Sec. 2.4, the deformation regularization loss constrains the Gaussian deformation to be similar in position and shape to its K neighbors. The color regularization loss L_smooth is a total variational loss, which is a common regularizer in inverse problems and was used in K-planes [5] and 4DGS [26]. L_smooth is applied to surgical tool mask M, which is the intersection of the surgical tool masks on all frames. L_smooth is defined to minimize the color difference between a pixel and its neighbors: (1/n)sum(L2(p^(i,j)-p^(i-1,j))+L2(p^(i,j)-p^(i,j-1))). Here n is the number of pixels in union tool mask M, p is the predicted color of the pixel in M, i and j are its row index and column index respectively. We will update the formula of L_smooth in revised manuscript.

R1、R3、R4. How to get camera poses, depth and masks of surgical tools: We conducted experiments on two public datasets: EndoNeRF and StereoMIS. For EndoNeRF dataset, it sets camera pose to an identity matrix due to the fixed shooting camera of the scene. Depth maps are generated using a pre-trained STTR-light [S1] model, and tool masks are obtained through manual labeling. For StereoMIS dataset, we applied the same method to obtain the camera poses and depth maps. Additionally, we used SAM-Track [S2] model to predict the tool masks on each frame. We will add the description of datasets to manuscript. [S1] Li, Zhaoshuo, et al. “Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers.” ICCV 2021. [S2] Cheng, Yangming, et al. “Segment and track anything.” arxiv 2023.

R1. Computational efficiency and user study: Our target is to improve reconstruction quality. However, the rendering speed of our method also exceeds 80 FPS, which is significantly surpassing the real-time application requirement of 30 FPS. In addition, we will add a professional user study to get feedback from medical experts.

R3. Reconstructed surface deformation and evaluation: Thanks for the suggestion. The surface deformation could be observed from reconstructed point cloud of each frame. Similar as EndoSurf [29], we are able to evaluate the deformation by comparing the point cloud distance between reconstructed point cloud and projected point cloud based on GT depth.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal has addressed the concerns from the reviewers, I would like to recommend an acceptance for this paper. Authors please include the justifications from the rebutall in the camera-ready version.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The rebuttal has addressed the concerns from the reviewers, I would like to recommend an acceptance for this paper. Authors please include the justifications from the rebutall in the camera-ready version.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents a novel deformable 3D Gaussian Splatting framework, SurgicalGaussian, specifically designed for high-fidelity reconstruction of dynamic surgical scenes. The 3D Gaussian-based representation in canonical space captures intricate textures of the tissue, while the forward-mapping deformation field enhances its ability to model complex motions. Extensive experiments compared with state-of-the-art methods demonstrates significant improvement in terms of reconstruction quality. After careful consideration of the authors’ rebuttal, most reviewers now lean towards accepting the paper. I agree that the authors have adequately addressed the major concerns and questions raised by the reviewers regarding the novelty of the proposed methods and some of the technical aspects. This said, I lean towards a accepting the paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper presents a novel deformable 3D Gaussian Splatting framework, SurgicalGaussian, specifically designed for high-fidelity reconstruction of dynamic surgical scenes. The 3D Gaussian-based representation in canonical space captures intricate textures of the tissue, while the forward-mapping deformation field enhances its ability to model complex motions. Extensive experiments compared with state-of-the-art methods demonstrates significant improvement in terms of reconstruction quality. After careful consideration of the authors’ rebuttal, most reviewers now lean towards accepting the paper. I agree that the authors have adequately addressed the major concerns and questions raised by the reviewers regarding the novelty of the proposed methods and some of the technical aspects. This said, I lean towards a accepting the paper.



back to top