Abstract

Tissue deformation poses a key challenge for accurate surgical scene reconstruction. Despite yielding high reconstruction quality, existing methods suffer from slow rendering speeds and long training times, limiting their intraoperative applicability. Motivated by recent progress in 3D Gaussian Splatting, an emerging technology in real-time 3D rendering, this work presents a novel fast reconstruction framework, termed Deform3DGS, for deformable tissues during endoscopic surgery. Specifically, we introduce 3D GS into surgical scenes by integrating a point cloud initialization to improve reconstruction. Furthermore, we propose a novel flexible deformation modeling scheme (FDM) to learn tissue deformation dynamics at the level of individual Gaussians. Our FDM can model the surface deformation with efficient representations, allowing for real-time rendering performance. More importantly, FDM significantly accelerates surgical scene reconstruction, demonstrating considerable clinical values, particularly in intraoperative settings where time efficiency is crucial. Experiments on DaVinci robotic surgery videos indicate the efficacy of our approach, showcasing superior reconstruction fidelity PSNR: (37.90) and rendering speed (338.8 FPS) while substantially reducing training time to only 1 minute/scene. Our code is available at https://github.com/jinlab-imvr/Deform3DGS.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3887_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/jinlab-imvr/Deform3DGS

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Yan_Deform3DGS_MICCAI2024,
        author = { Yang, Shuojue and Li, Qian and Shen, Daiyun and Gong, Bingchen and Dou, Qi and Jin, Yueming},
        title = { { Deform3DGS: Flexible Deformation for Fast Surgical Scene Reconstruction with Gaussian Splatting } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    With this paper, the authors propose a Gaussian Spatting-based method to reconstruct a surgical scene subject to deformation. Using a point-cloud acquired by stereoscopic correspondence, a Gaussian point-cloud is initialized which is deformed by a motion model trained on data to represent the deformed scene. Using alpha-blending, an RGB-image and a depth map of the deformed scene is generated from the Gaussian point-cloud, which are used to estimate the parameters of the motion model as well as the Gaussian point-cloud in an optimization framework. Using standard datasets, the proposed method is compared to the state-of-the-art to demonstrate their superiority.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    [1] A method to reconstruct deformable surgical scenes based on Gaussian Splatting is proposed. A motion model that can be trained on fewer number of frames is also proposed. The adaption of these widely used methods like Gaussian Splatting into a medical problem is novel. In addition, the authors demonstrate the superiority of the proposed method using standard datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    None

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper provides a high-level description of the proposed method. However, specific details (optimization parameters, thresholds etc.) are missing. Therefore, I doubt if anyone would be able to faithfully reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    [1] The paper starts with an adequate introduction to the clinical problem referring to the state-of-the-art methods in the literature. The methods are well described with justifications to why they were chosen and how they address the limitations in prior works. The experiments are adequately described while the results are discussed. The limitations of the proposed methods are identified as well. Overall, the paper reads very well. In the conclusions section, the authors could have explicitly pointed out avenues for future research to further improve the quality of the paper.

    [2] The adaption of the Gaussian Splatting for deformable surgical scene reconstruction has some novelty. Although simple, the proposed motion model based on a simple gaussian seems to outperform complicated models based on polynomials. These two contributions add adequate novelty to the paper.

    [3] In section 2.4, the authors describe the initialization of the Gaussian point-cloud using a stereoscopic reconstruction. However, the specific method used is not referred. For the reproducibility of the results, it is recommended that the authors share this information with specific parameters used in experiments. The same recommendation goes with the section 2.5 where the authors describe the optimization framework. What method was used to optimize the loss in equation (7)? How was its parameters set in the experiments? What challenges did you encounter? The answers to these questions will help in the reproducibility of the results, unless the source-code is shared.

    [4] The results of the experiments are presented well. The superiority of the proposed method over the state-of-the-art is shown both quantitatively and qualitatively. However, one or two failure cases would have helped improve the quality of the paper.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has adequate novelty. In addition, the proposed method is compared to the state-of-the-art by using standard datasets to demonstrate its superiority. The paper reads very well, and the results are discussed adequately. However, the reproducibility of the paper, at its current state, is questionable. If this can be adequately addressed, the paper is suitable for presentation at MICCAI.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper introduces a novel method for surgical scene reconstruction utilizing 3D Gaussian Splatting. The approach employs a motion-aware point fusion technique designed to better initialize the Gaussian point cloud. Additionally, a flexible deformation modeling (FDM) technique is incorporated to effectively capture and represent tissue deformations. The authors conducted evaluations on two public datasets, demonstrating that the proposed method achieves performance comparable to current state-of-the-art reconstruction methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method for surgical scene reconstruction is both well-conceived and effectively demonstrated through evaluations and ablation studies. These studies effectively highlight the significant impact of the novel algorithms on improving the accuracy and speed of the reconstruction. Additionally, the paper is well-written and it is easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The overall contribution appears to be incremental when compared to existing methods like EndoGaussian. Particularly, the performance gains achieved through the adoption of the Flexible Deformation Modeling (FDM) are relatively minor which raises questions about the practical significance and impact of the new method. It would be beneficial for the authors to further explore and articulate the advantages of their approach in contexts where its contribution could be more distinctly advantageous or to refine the methodology to yield more substantial improvements.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. There appears to be a mismatch in the order of sections when introducing the pipelines in Section 2.1.
    2. It would be better to switch the order of sections 2.3 and 2.4. The figure associated with these sections introduces the initialization before the deformation model, and mirroring this sequence in the text would maintain consistency.
    3. The performance of the proposed method is similar to the EndoGaussian. I understand EndoGaussian is concurrent work. However, it is crucial to distinctly highlight the advantages of your proposed method. While the increase in processing speed is a benefit, further exploration into specific scenarios where Flexible Deformation Modeling (FDM) provides additional advantages would be beneficial. Consider discussing whether FDM offers improvements in handling longer surgical video sequences or environments with high levels of deformation. Highlighting these specific contexts can clarify the unique contributions of your approach and better position it against concurrent technologies.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well-written and the proposed deformation modeling method is novel. I am willing to accept the paper if the authors can address my concerns.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper presents a novel Gaussian-splatting-based method for fast surgical scene reconstruction. It represents surgical scenes with a set of 3D Gaussians and proposes a novel flexible deformation modeling scheme to learn deformation dynamics. It also designs a point cloud initialization strategy to improve the reconstruction quality. Experiments on two public datasets demonstrate that the proposed method achieves superior and faster results than SOTA methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper builds upon Gaussian splatting but introduces crucial modifications tailored for surgical scenes. Specifically, it additionally models deformations for each Gaussian’s position, rotation, and scale based on the observation that tissues are prone to elastic deformations during instrument intervention. Additionally, a novel initialization strategy is devised to address point cloud distribution irregularities resulting from instrument occlusion. Experiments and ablation studies demonstrate the effectiveness and efficiency of the proposed method. The paper is well-structured and easy to understand.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    This paper generally maintains good quality but exhibits minor formatting issues. Specifically, Table 1 requires improvement in number formatting. In the “speed” column, inconsistencies are noted where some numbers have two decimal places while others have only one. Moreover, in the “LPIPS” column, it is recommended to use three decimal places for clarity, especially when values such as 0.06 are identical for both EndoGaussian and our method, making it difficult to discern superiority. Additionally, including information on memory consumption for each method (or the number of Gaussians for 3DGS-based methods), in Table 1 would provide readers with insights into the efficiency of the proposed approach.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No additional comments for reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This paper presents a novel 3DGS-based method for endoscope reconstruction. The proposed deformation and initialization module is innovative and effective. The fast rendering speed and good reconstruction quality show its potential in real-world applications. I have one question regarding the implementation details of this work. This work trains 3D Gaussians with merely 3000 iterations, while in the original 3DGS takes 30k iterations. May I confirm if it is a typo or not. If not, I expect justifications and insights from authors regarding why 3000 iterations are sufficient for endoscope reconstruction.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper extends 3D Gaussian splatting to endoscopic reconstruction tasking by introducing two modules to account for tissue deformation and point cloud initialization. The overall design is novel and inspiring. The experiments prove the effectiveness and efficiency of the proposed method. Overall, it is a good-quality paper that is worth accepting by MICCAI.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the AC and reviewers for their time. Most comments are positive and supportive, highlighting our contribution to “adapting the Gaussian Splatting for deformable surgical scene reconstruction” with a “novel and inspiring design”. Our experiments “demonstrate the effectiveness and efficiency of the proposed method.”

To R3: Regarding future research, current progress in using Gaussian Splatting (GS) to drive SLAM has drawn our attention because accurate endoscopic camera poses are unavailable but of great importance. However, existing GS-based SLAM solutions fail to model deformable tissue surfaces, making them inapplicable for medical use. Thus, we intend to explore how to integrate our deformable GS into GS-based SLAM pipelines to model wider and more complex surgical scenes.

Regarding reproducibility, we will release our source code and share the link in the camera-ready version.

Regarding the failure cases, unfortunately, we cannot add new experiments according to the official instructions. But GS, as an explicit 3D representation, is inherently vulnerable to occluded regions where no RGB(-D) ground truth can be used to supervise the model optimization. Therefore, failure reconstructions happen when significant and long-lasting occlusions exist during video collection.

To R4: Regarding the formatting issues and memory consumption, thanks for pointing out them and the suggestions. Due to efficient parameterization of deformation, FDM merely takes 50%~60% of the memory cost compared to EndoGaussian during training. We will update them in the camera-ready version.

Regarding the number of iterations, we truly set it as 3000 and removed the coarse stage in the original 4DGS paper since it was found to be less contributing to the reconstruction performance. This iteration number is significantly smaller than the number in the original 4/3DGS papers as the latter focuses on monocular videos with longer duration and RGB images only. On the contrary, surgical clips in the dataset for endoscope 3D reconstruction are relatively shorter. Meanwhile, there exists stereo information that can be leveraged to initialize the Gaussian point cloud with rich 3D geometric priors and thus largely ease the 3D information modeling.

To R6: Regarding the concerns about incremental performance gains v.s. the concurrent work EndoGaussian, we would like to highlight that our research goal is to make GS reconstruction as fast as possible. According to our experiment, the proposed solution reduces over 50% of the reconstruction time while maintaining commensurate reconstruction quality with EndoGaussian. As shown in Table II, our method yields higher quality within a limited training time, which makes it highly efficient and better caters to intraoperative use.

Moreover, for distinct advantages, our FDM benefits from the nature of globally modeling the temporal deformation. Per-Gaussian deformation at various timestamps shares the same parameters (i.e., weights and learnable basis functions). Thus, deformation at one arbitrary timestamp can be naturally interpolated by deformations modeled at other timestamps. As a comparison, Hexplane-based methods (e.g., EndoGaussian and LerPlane) locally model the temporal deformation in grid structures without capturing long-range information. Thus, our method is more robust to missing frames since parameters updated at other frames still contribute to the overall deformation model.

Regarding the capability of dealing with long videos, FDM has flexibility by changing the number of learnable basis functions and therefore, has great potential to handle various video durations and deformation levels. Also, we are constructing more challenging datasets and will explore them in future work.

Regarding the order of sections, thank you for your suggestion and for pointing out the mismatch. We will correct the orders and make changes accordingly.




Meta-Review

Meta-review not available, early accepted paper.



back to top