Abstract

The advent of 3D Gaussian Splatting (3D-GS) techniques and their dynamic scene modeling variants, 4D-GS, offers promising prospects for real-time rendering of dynamic surgical scenarios. However, the prerequisite for modeling dynamic scenes by a large number of Gaussian units, the high-dimensional Gaussian attributes and the high-resolution deformation fields, all lead to serve storage issues that hinder real-time rendering in resource-limited surgical equipment. To surmount these limitations, we introduce a Lightweight 4D Gaussian Splatting framework (LGS) that can liberate the efficiency bottlenecks of both rendering and storage for dynamic endoscopic reconstruction. Specifically, to minimize the redundancy of Gaussian quantities, we propose Deformation-Aware Pruning by gauging the impact of each Gaussian on deformation. Concurrently, to reduce the redundancy of Gaussian attributes, we simplify the representation of textures and lighting in non-crucial areas by pruning the dimensions of Gaussian attributes. We further resolve the feature field redundancy caused by the high resolution of 4D neural spatiotemporal encoder for modeling dynamic scenes via a 4D feature field condensation. Experiments on public benchmarks demonstrate the efficacy of LGS in terms of a compression rate exceeding 9 times while maintaining the pleasing visual quality and real-time rendering efficiency. LGS confirms a substantial step towards its application in robotic surgical services. Project page: https://lgs-endo.github.io/.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0827_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0827_supp.pdf

Link to the Code Repository

https://lgs-endo.github.io/ https://github.com/CUHK-AIM-Group/LGS

Link to the Dataset(s)

https://github.com/med-air/EndoNeRF https://endovissub2019-scared.grand-challenge.org/

BibTex

@InProceedings{Liu_LGS_MICCAI2024,
        author = { Liu, Hengyu and Liu, Yifan and Li, Chenxin and Li, Wuyang and Yuan, Yixuan},
        title = { { LGS: A Light-weight 4D Gaussian Splatting for Efficient Surgical Scene Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    Three techniques (DAP, GAP, and FFC) were implemented to significantly reduce the storage space needed for 4DGS.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Paper is clear and well-written.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The article noted the use of the knowledge distillation method during model optimization. From my understanding, knowledge distillation typically necessitates a pre-trained model serving as a teacher model. This implies that the framework outlined in the article might not directly facilitate real-time reconstruction. In my view, the significance of reducing storage space becomes apparent if real-time surgical reconstruction is feasible. However, if real-time reconstruction isn’t feasible, it raises questions about the necessity of retraining an already trained model solely to conserve several hundred megabytes of space. Hence, the significance and practical application of the work is my major concern.

    The author introduced three methods aimed at reducing space utilization, primarily focusing on proposing, experimenting, and numerically comparing them to demonstrate space reduction. However, this approach leans heavily towards engineering applications and may fall short of meeting MICCAl acceptance criteria. It would be beneficial to analyze the effectiveness of the proposed methods from a theoretical standpoint, providing deeper insights into their underlying principles and theoretical foundations, thereby enhancing the scholarly rigor and theoretical contribution of the research.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    On Page7. Should the indicator value in the Full model (Ours) row of table 2 be the same as the LGS (Ours) (ENDONERF dataset) of Table 1?

    In my understanding, the EndoNeRF method operates under the assumption of a fixed camera position, whereas the camera in the SCARED[2] dataset exhibits movement. Clarification is needed on how the author managed to achieve results using the EndoNeRF method with the dynamic camera movements present in the SCARED[2] dataset.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The practical value of the proposed method is questionable. The technical novelty is also too small for MICCAI.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper presents adaptations to 4D Gaussian Splatting which reduce the memory consumption of the method and increase the (inference time) rendering speed. The method is evaluated on two datasets and compared to several state-of-the-art method. The presented method is slightly better in rendering speed than the next fastest method, but consumes a much lower amount of memory.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The amount of memory reduction is impressive. The paper structure is very clear and easy to follow along. The introduction/motivation and the evaluation section were very clear.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I am concerned that the paper might by trying to “oversell” some of its methods. For example, the introduced “Feature Field Condensation” is, I believe, just an average pooling method, and the “Gaussian-Attribute Pruning” is, if I understand correctly, just choosing a lower number of Spherical Harmonic Order, i.e. changing a hyperparameter of the origianl Gaussian Splatting method. By giving these things complex names, I believe the paper artificially complicates the method a bit. I think the methods themselves are justified, well chosen and give interesting results, but the way they are named and introduced overcomplicates things in my understanding.

    I miss some more qualitative and quantitative results. Qualitative: To augment the results in Fig. 2, could more images of re-renderings of the point cloud from different perspectives be shown? The shown images do not help me in assessing the correct representation of depth, for example. In some of my earlier experiments, I’ve seen that these kind of reconstruction methods can easily create false points (or Gaussians) somewhere in the vicinity of the camera, for example, to almost perfectly represent the original image, but when the camera is moved these artifacts become apparent. Alternatively, renderings of the reconstructed depth would be interesting (distance from camera plane to nearest Splat?). Or a re-rendered video from a novel view point for the supplementary material? Quantitative: It would be very interesting to see results/metrics for the various SH orders (different h_sh). In the current state, it may be too intransparent why the given h_sh values were chosen, and it is hard to judge how the parameter influences reconstruction.

    The methods section left me with a few questions. Some variables/symbols aren’t explained precisely, while others are explained in what I feel is a too complicated way.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The abstract sounds like the only bottlenecks in GS are storage and rendering. However, in my understanding, the high rendering speed is actually one of the major advantages of the GS method (compared to other methods, like NERFs), and major bottleneck for surgical application are the usually missing known camera poses and the slow optimization speed during scene reconstruction.

    I slightly disagree with the argumentation that memory requirements of the baseline methods are too high (“which hinders model deployment on robotic surgical devices”). The highest memory consumption of any of the reported methods seems to be 334.5MB, which a typical GPU can easily handle these days. I do agree that lower memory consumption is advantageos, but I doubt the current requirements “hinder” deployment.

    I’m a bit uncertain about the usage of “real-time” in this context, because as far as I understand the reconstruction is far from real-time, only the re-rendering is real-time. For example, the sentence “the relevant progress has been extended to clinical scenarios like reconstructing deformable tissues in dynamic endoscopic scenes in real-time efficiency [6,10,14,31]” sounds, to me, as if the reconstruction of deformable tissues was real-time. But following some of the citations, I believe these methods are usually not performing real-time reconstruction? For example, the cited Endo-4DGS contains the sentence: “where the training was accomplished with only 4 minutes and 4GB of GPU memory.” I suggest to chose a more precise wording throughout the paper regarding this. How long does the reconstruction of one scene take? I assume because of the student-teacher model, it takes quite a lot longer than (for example) EndoGaussian?

    In Fig. 2 -> are these Test images, i.e. those which have not been seen during optimization?

    I have a few questions regarding the deformation score:

    • If the sum is computed over all pixels in the image, then the result depends a lot on the current camera
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method section was not clear enough to me, and some parts seemed to be over-selling pieces of the method which could be explained in much simpler terms, I think. The results focus a lot on the memory consumption and maybe too little on the influence of the hyperparements (SH, pooling size, …) on the reconstruction quality.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper presents Lightweight 4D Gaussian Splatting, a method for 3D reconstruction of dynamic surgical scenes in robotic surgery. Due to gaussian splatting, and the proposed deformation-aware pruning of gaussians, their approach enables very fast rendering times (over 180 frames per second). More importantly, the paper proposes three techniques to limit the model’s memory footprint, which is decreased 9x without loosing image quality.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper addresses two complications with earlier methods for 3D surgical scene reconstruction that makes it more challenging to apply in a clinical setting: rendering times and memory requirements to store model weights. By mitigating both factors, the proposed approach makes it more likely that such models are eventually used in the operating room.

    • Two of the three solutions (“deformation-aware pruning” and “feature field condensation”) convincingly result in small memory footprint models, without loosing image quality. Moreover, deformation-aware pruning results in 2x faster rendering speeds.

    • The paper compares their work with an exhaustive list of related methods, including other (yet unpublished) gaussian splatting methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • In Table 1, it is unclear how the performances of related methods are obtained. The numbers do not seem to correspond with scores reported in the referenced papers. To assess the fairness of the comparison, it would be good if the paper is very explicit about this.

    • It is expected that the performance scores for the “full model” in Table 2 correspond to the scores of “LGS” in Table 1. However, they are different. To fully understand the evaluation protocol, it would be good to explain why these are different.

    • The paper proposes a student-teacher approach for training a reduced-memory model. However, since the student is supervised with ground truth image loss already, the supervision with teacher rendered images seems redundant. The paper could proof its relevancy with an additional experiment that evaluates the effect of the extra teacher loss.

    • In equation 2, the criterion that reflects whether a Gaussian contributes to a pixel could be explained in more detail. How is this calculated? Explaining this part is essential for reproducibility of the method.

    • Although also used in related work as well, it would be better to use all 6 scenes in the EndoNeRF dataset instead of 2 only.

    • In the introduction, the paper claims that “3D scenes […] enhances comprehension of the spatial environment […], thereby enabling surgeons to conduct more precise and efficient operations.” I am not aware of literature that proofs this statement and I could not find it in the referenced article either. If the claim is difficult to proof, it would be better to weaken the statement.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Datasets are publicly available. Method descriptions are mostly clear enough to implement the code yourself.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper presents valuable work that drives the field of 3D surgical scene reconstruction further. Nevertheless, there are some questions about the presented performance scores and claims made in the article (see comments above).

    Small remark: in Figure 2, the method is referred to as “MES-GS” although this should probably be “LGS”.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    When questions about the presented performance scores are answered and fair comparisons are assured, the paper makes a good chance for acceptance.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I thank the authors for their answers. As they can assure a fair comparison and explain different scores for table 1 and 2, I have no major weakness that blocks acceptance.




Author Feedback

First, we really appreciate all the reviewers for their valuable suggestions. We first address some common questions (CQ).

[CQ1@R3, R4] Is setting of Light-weight Gaussian Splatting (LGS) reasonable and practical? (I) Common acknowledgement: Compressing Gaussian Splatting (GS) is an important step for implementing GS in practice, the practical value of which is widely acknowledged by CVPR’24 when considering accepted papers [i, ii]. (II) Does distillation weaken the real-time reconstruction/rendering?: First, we will clarify in revision that the terms “real-time reconstruction” are often mixed with “real-time rendering” rather than the efficient training in common practice [10, 14, 31], and there is also basically no work that can achieve real-time training. After this misconception is dispelled, it is worth noting that the knowledge distillation used only increases training time, without adding to inference time. (III) Is clinical practice need LGS? Due to the development of applications deploying medical AI models in surgical robotics and point-of-care scenarios, efficient memory usage and real-time rendering capabilities for compressing Gaussian scattering to achieve 3D representations play a crucial role on these edge-low resource scenarios.

[CQ2@R3, R4] Different values in Tables The different results of LGS in Table 1 and Table 2 lie in different number of gaussians in initialization. The initial number of Gaussians in Table 1 is 3k, while 30k in Table 2.

[CQ3@R3, R4] How performances of related methods obtained? (I) We reproduce the related methods using their released code and apply the same setting as in [14, 29, 31]. (II) NeRF is suitable for multi-view scenes, and specially for EndoNeRF on dataset SCARED, we use the same setting as [14, 29]. (III) EndoNeRF has only open-sourced two scenes.

Then we response to the other specific comments from each reviewer.

[R3 Q1] Novelty regarding theoretical insight. As you acknowledged, theoretical analysis of methods offers insights into fundamental principles and novelty. Actually, all of our designed modules are exactly derived from theoretical insights. (I) With a theoretical analysis on each Gaussian’s importance on deformation, Deformation-Aware Pruning minimizes memory usage by limiting the number of Gaussians. (II) Gaussian-Attribute Pruning derives insight from the rendering principle of 3D GS and reduce redundant attributes. (III) FFC is designed based on the key component (Spatial-Temporal Structure Encoder) of 4D Gaussian and compresses the representation of space-time state adjacent Gaussians.

[R4 Q1] Clarification about results (I) Qualitative: Please note that the view of SCARED is dynamic. LGS’s well performance on SCARED in Table 1 shows its ability to achieve the same accurate depth as EndoGaussian. Thank you for your advice, and visual results will be included in future work. (II) Quantitative results for h_sh equal to 3 are shown in Table 2 line 2, indicating performance inferior to that of h_sh equal to 2 across all metrics.

[R4 Q2] How about memory usage and training time for LGS? (I) Memory considerations discussed in our study do not pertain to the GPU memory utilized during training or rendering, but the storage memory on device. (II) The training LGS takes about 3 minutes.

[R5 Q1] Effect of teacher loss LGS without teacher loss performs worse than full LGS. This is also the common setting of knowledge distillation [9]. As the output of teacher model is more easily imitated by student model, supervision from a mixture of teacher output and GT is essential.

[R5 Q2] Explain for Equation 2 The calculation occurs during the differentiable rasterization of Gaussian Splatting. Specifically, as we render the color for each pixel, we record the number of pixels each Gaussian hit.

[i] Lee, Joo Chan “Compact 3d gaussian representation for radiance field.”, CVPR’24 [ii] Niedermayr, “Compressed 3d gaussian splatting for accelerated novel view synthesis.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The provided rebuttal has addressed reviewers’ concerns, even though R3 did not update his/her review after. I recommend acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The provided rebuttal has addressed reviewers’ concerns, even though R3 did not update his/her review after. I recommend acceptance.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top