Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Accurate 3D reconstruction of dynamic surgical scenes from endoscopic video is essential for robotic-assisted surgery. While recent 3D Gaussian Splatting methods have shown promise in achieving high-quality reconstructions with fast rendering speeds, their use of inverse depth loss functions compresses depth variations. This can lead to a loss of fine geometric details, limiting their ability to capture precise 3D geometry and effectiveness in intraoperative applications. To address the limitations of existing methods, we developed SurgicalGS, a dynamic 3D Gaussian Splatting framework specifically designed for improved geometric accuracy in surgical scene reconstruction. Our approach integrates a temporally coherent multi-frame depth fusion and an adaptive motion mask for Gaussian initialisation. Besides, we represent dynamic scenes using the Flexible Deformation Model and introduce a novel normalized depth regularization loss and an unsupervised depth smoothness constraint to ensure high geometric accuracy in the reconstruction. Extensive experiments on two real surgical datasets demonstrate that SurgicalGS achieves state-of-the-art reconstruction quality, especially in precise geometry, advancing the usability of 3D Gaussian Splatting in robotic-assisted surgery. Our code is available at https://github.com/neneyork/SurgicalGS.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4585_paper.pdf

SharedIt Link: https://rdcu.be/eHw7u

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05141-7_55

Supplementary Material: https://papers.miccai.org/miccai-2025/supp/4585_supp.zip

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{CheJia_SurgicalGS_MICCAI2025,
        author = { Chen, Jialei AND Zhang, Xin AND Hoque, Mobarak I. AND Vasconcelos, Francisco AND Stoyanov, Danail AND Elson, Daniel S. AND Huang, Baoru},
        title = { { SurgicalGS: Dynamic 3D Gaussian Splatting for Accurate Robotic-Assisted Surgical Scene Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15970},
        month = {September},
        page = {572 -- 582}
}

Reviews

Review #1

Please describe the contribution of the paper

The authors proposed an 3D Gaussian Splatting based method that integrates geometric information from all frames for dense Gaussian initialization to recover tissues occluded by tools and improve reconstruction quality.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

An adaptive motion mask is designed to extract pixels exhibiting significant depth variations and occluded tissues, which effectively suppresses depth noise and transient artifacts.

A normalized depth loss is proposed that aligns binocular and rendered depth maps to a consistent scale, ensuring training stability while preserving depth variability.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The tissue mask in Eq.8 - does it represent the inverse of the tool mask?

In Eq.9, the adaptive motion mask is calculated based on a depth variation term (|D₀ - Dᵢ| < τ), where D denotes the stereo-matching derived depth map (Equation (8)). Notably, the observed depth variations stem not only from surgical tool occlusions but also from camera motion, since D represents scene depth relative to the moving camera.

Furthermore, since the point clouds are already filtered by the tissue mask (Eq.8) and the motion mask is defined as a subset of the tissue mask (Eq.9), I would like to clarify whether applying Eq.10 introduces any additional modifications to the point clouds beyond these existing masking operations.

I therefore recommend conducting controlled experiments to validate the method’s effectiveness.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

In Eq.9, the adaptive motion mask is calculated based on a depth variation term (|D₀ - Dᵢ| < τ), where D denotes the stereo-matching derived depth map (Equation (8)). Notably, the observed depth variations stem not only from surgical tool occlusions but also from camera motion, since D represents scene depth relative to the moving camera.

Furthermore, since the point clouds are already filtered by the tissue mask (Eq.8) and the motion mask is defined as a subset of the tissue mask (Eq.9), I would like to clarify whether applying Eq.10 introduces any additional modifications to the point clouds beyond these existing masking operations.

I therefore recommend conducting controlled experiments to validate the method’s effectiveness.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

In the authors’ response, they stated, “Following previous methods, we used the fixed camera setup in our experiments.” Normally, in such cases, specific literature should be cited to support this claim and improve persuasiveness. At the same time, I could not find any mention of a “fixed camera” in the original paper. More importantly, in real endoscopic surgery, the camera cannot remain fixed at all times.

Considering that the depth map used in Eq9 is based on the stereo registration result (Eq8), rather than point cloud coordinates, and the authors did not describe any processing steps similar to registration, it remains unclear what the physical meaning is of computing the difference between the depth maps of the current frame and the initial frame.

Furthermore, regarding my comment during the review: “Furthermore, since the point clouds are already filtered by the tissue mask (Eq.8) and the motion mask is defined as a subset of the tissue mask (Eq.9)”, the authors may have misunderstood my point. What I meant was that by substituting Eq8 and Eq9 into Eq10, it becomes clear that the element-wise multiplication with Mi in Eq8 (since Mi is a binary tissue mask) already covers the intersection operation in Eq9, (as per intersection properties).

In conclusion, these issues lead me to question the validity and practicality of the authors’ claimed contributions.

Review #2

Please describe the contribution of the paper

First authors designed a method to smoothen depth estimation with temporal consistancy and a motion masking which is spatial constraint. Then they further regularize SurgicalGS with a depth smoothness term for 4DGS training.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Results looks promising on extensive popular dataset to get SOTA performance.
2. The ablation study convincingly demonstrates the efficacy of most of the proposed approach.
3. The quantification on the geometric representation of 4dgs is something new in the field which is crucial for clinical use.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Use depth prior to guide 4DGS is not something new in general vision domain see MICCAI 2024 accepted paper https://github.com/lastbasket/Endo-4DGS. Sub-optimal comprehensive benchmark with existing study qualitative comparison figure could try relative error map with proper color map to highlight the difference otherwise it’s bit difficult to see them, making the readability worse. Analysis on the difference of fps performance is preferred
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Given the strengths in improving geometric accuracy and achieving state-of-the-art reconstruction quality, SurgicalGS presents valuable contributions to the field of robotic-assisted surgery. However, the limited novelty in depth loss formulation and visualization are its drawbacks.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

addressed my concerns

Review #3

Please describe the contribution of the paper

This paper introduces SurgicalGS, a dynamic 3D Gaussian Splatting framework designed to improve geometric accuracy in robotic-assisted surgical scene reconstruction. It achieves state-of-the-art results on two surgical datasets and shows superior geometric accuracy while maintaining competitive rendering speed and image quality.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- This paper proposes a novel integration of depth priors and loss functions to address limitations of existing 3DGS methods.
- Extensive experiments on real surgical datasets, including ablation studies, validate the effectiveness of each proposed component. The method outperforms prior work in geometric accuracy.
- It also maintains real-time rendering speeds, making it suitable for intraoperative applications.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Experiments are conducted on two datasets. The performance in more diverse datasets and complex surgical environments (e.g., smoke, blood, or extreme deformations) requires discussion to demonstrate generalizability
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed method presents a meaningful advancement in surgical scene reconstruction by addressing geometric accuracy. The experiments are thorough, demonstrating clear improvements over existing approaches.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #4

Please describe the contribution of the paper

The main contribution of the paper is the introduction of SurgicalGS , a novel framework that enhances the geometric accuracy of surgical scene reconstruction using 3D Gaussian Splatting (3DGS) . It integrates temporally coherent multi-frame depth fusion and an adaptive motion mask for dense Gaussian initialization, along with a normalized depth regularization loss and unsupervised depth smoothness constraint to improve precision. Experimental results on real surgical datasets demonstrate state-of-the-art performance in both geometric accuracy (RMSE: 1.820 mm) and rendering quality (PSNR: 38.18), outperforming existing methods like EndoNeRF, EndoGaussian, and Deform3DGS.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Novel Formulation: Normalized Depth Regularization: The paper introduces a normalized depth regularization loss to address the limitations of traditional inverse depth loss functions, which compress depth variations and degrade geometric accuracy. This ensures accurate alignment between predicted and actual depth maps while preserving depth variability, leading to better reconstruction of fine geometric details.

Dense Initialization with Adaptive Motion Mask: The framework employs a dense initialization strategy using temporally coherent multi-frame depth fusion and an adaptive motion mask to suppress noise and transient artifacts caused by tool occlusions. This approach recovers tissues occluded by surgical tools and improves the quality of Gaussian point cloud initialization.

Flexible Deformation Modeling: The use of Flexible Deformation Modeling (Fourier and polynomial basis functions) allows the method to effectively represent dynamic tissue deformations in surgical scenes.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Lack of Novelty in Core Technique: While the authors present SurgicalGS as a novel approach, the core 3D Gaussian Splatting (3DGS) technique is not new and was introduced in previous work. The main innovation appears to be in the application of existing techniques to surgical scenes and the addition of specific components like normalized depth regularization and unsupervised depth smoothness.

Limited Clinical Validation: The paper lacks extensive clinical validation or demonstration of real-world applicability in actual surgical settings. While it tests on two datasets (EndoNeRF and StereoMIS), these are relatively controlled environments compared to real operating rooms. The authors don’t provide evidence of testing in live surgical scenarios or with varying conditions (e.g., blood, smoke, etc.).
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors proposed a novel solution in 3D-GS based reconstruction for accurate robotic-assisted surgical scene, with sufficient evaluation and demonstration, making it a solid work.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

Thank the reviewers for their valuable feedback. We appreciate reviewers agree that our geometric representation is crucial for clinical use (R4) and that our work achieves SOTA performance (R1, R2, R4). All concerns are addressed as follows.

Reproducibility: Our code will be released upon acceptance.

Test in complex surgical environments (R1, R2): Our method is already validated under large deformation conditions (Endonerf ‘pulling’). However, the requirement for precise camera pose makes it hard to find more datasets with smoke or blood. As discussed in the paper, future work will focus on evaluating the robustness of SurgicalGS on more challenging datasets.

Lack of Novelty in Core Technique (R2): Although built on the 3DGS foundation, we proposed a new methodology including normalized depth regularization to improve the geometry accuracy, mitigating surface irregularities in GS-based endoscopic reconstructions, which is crucial for clinical use. We also introduced a dense Gaussian initialization to recover tissues occluded by tools, helping surgeons to comprehensively observe the surgical environment.

Equation clarification (R3): In Eq. 8, the tissue mask represents the inverse of the tool mask. Following previous methods, we used the fixed camera setup in our experiments, allowing us to directly calculate adaptive motion masks based on depth variation defined in Eq. 9. For the variation caused by camera motion in real scenarios, we utilize the camera pose and depth map to localize pixel position in 3D space and calculate the depth variation with Euclidean distance. In Eq. 10, we fuse the point cloud filtered by the tissue mask in the first frame and the point clouds filtered by the motion mask in the remaining frames. This provides a dense Gaussian initialization with cross-frame information, helping reconstruct tissues with extreme deformation. The ablation result in Tab. 2 demonstrates that the dense initialization improves both rendering quality and geometry accuracy.

Difference of depth prior with Endo-4DGS (R4): Endo-4DGS adopts monocular depth as depth prior to improve rendering quality. However, due to the scale ambiguity in monocular depth, Endo-4DGS struggled to reconstruct accurate depth maps, limiting its applicability in intraoperative settings where precise geometry is critical. In contrast, we utilized the stereo depth map and proposed a novel normalized depth regularization to reconstruct accurate geometry.

Sub-optimal rendering result (R4): Our method achieves superior performance in geometric reconstruction, while rendering quality is slightly below the baseline. This is due to the trade-off between geometric accuracy and rendering quality caused by the proposed normalized depth regularization. As shown in Tab. 2, our method with inverse depth regularization (line 3) – which is used in Deform3DGS and EndoGaussian – achieves a high PSNR (32.29), outperforming Deform3DGS (PSNR 31.61). However, the geometric accuracy is limited (RMSE 4.956). In contrast, although there is a slight decrease in rendering quality with our normalized depth regularization (PSNR 31.54), the geometric accuracy is significantly improved (RMSE 2.174). Our method is specifically designed to enhance geometric reconstruction accuracy, which is particularly critical for clinical applications.

Try relative error map to highlight the difference for depth maps (R4): We will use relative error maps with color maps for depth map figures in a later manuscript.

Analysis on the difference in fps performance (R4): We propose a dense initialization for 3DGS, which introduces more Gaussians to represent the scene compared with other GS-based methods. Additional Gaussians would require more computing resources, thus reducing the rendering speed. Although the rendering speed of our method is lower than that of Deform3DGS, it still achieves real-time performance (214 fps) while most real-world endoscopes operate at 30-60 fps.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

SurgicalGS: Dynamic 3D Gaussian Splatting for Accurate Robotic-Assisted Surgical Scene Reconstruction

Author(s):