Abstract

Within colorectal cancer diagnostics, conventional colonoscopy techniques face critical limitations, including a limited field of view and a lack of depth information, which can impede the detection of pre- cancerous lesions. Current methods struggle to provide comprehensive and accurate 3D reconstructions of the colonic surface which can help minimize the missing regions and reinspection for pre-cancerous polyps. Addressing this, we introduce “Gaussian Pancakes”, a method that lever- ages 3D Gaussian Splatting (3D GS) combined with a Recurrent Neural Network-based Simultaneous Localization and Mapping (RNNSLAM) system. By introducing geometric and depth regularization into the 3D GS framework, our approach ensures more accurate alignment of Gaussians with the colon surface, resulting in smoother 3D reconstructions with novel viewing of detailed textures and structures. Evaluations across three diverse datasets show that Gaussian Pancakes enhances novel view synthesis quality, surpassing current leading methods with a 18% boost in PSNR and a 16% improvement in SSIM. It also delivers over 100× faster rendering and more than 10× shorter training times, making it a practical tool for real-time applications. Hence, this holds promise for achieving clinical translation for better detection and diagnosis of colorectal cancer.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2298_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2298_supp.zip

Link to the Code Repository

https://github.com/smbonilla/GaussianPancakes

Link to the Dataset(s)

https://durrlab.github.io/C3VD/

BibTex

@InProceedings{Bon_Gaussian_MICCAI2024,
        author = { Bonilla, Sierra and Zhang, Shuai and Psychogyios, Dimitrios and Stoyanov, Danail and Vasconcelos, Francisco and Bano, Sophia},
        title = { { Gaussian Pancakes: Geometrically-Regularized 3D Gaussian Splatting for Realistic Endoscopic Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces “Gaussian Pancakes,” a novel approach that enhances 3D endoscopic reconstruction by integrating geometrically-regularized 3D Gaussian Splatting with a Recurrent Neural Network-based Simultaneous Localization and Mapping (RNNSLAM) system. This method improves the alignment of Gaussians with the colon surface, resulting in smoother, more detailed 3D reconstructions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Propose a novel normal loss for “pancaking” the Gaussians along the surface.
    2. A system that can applied to complex surgical environments.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Evaluation of 3D-GS. No comparison of 3D-GS in Tab. 1.
    2. Not fair to evaluate other Nerf-based methods. Since 3D-GS-based methods rely on SfM points for initialization, the running time of RNN-SLAM which provides the initial points can not be directly removed in the GPU running time.
    3. The main contribution I deem is the normal loss. However, this normal loss is not well evaluated. There are many methods for a smoother surface, eg. TV loss, or directly using Isotropic Gaussian points.
    4. The results heavily rely on the results of RNNSLAM (C3VD’s simple camera trajectories caused large gaps in reconstructions due to insufficient scene coverage for RNNSLAM’s mesh surface fusion step, limiting us to choose three datasets with satisfactory RNNSLAM performance to effectively evaluate 3D renderings.), which is not robust. Since the 3D-GS is differentiable, why not use photometric loss to optimize the 3D-GS and the camera pose simultaneously? (I believe many methods have done this.)
    5. The evaluation metrics of Table 1 and 2 are different. Why not report MS-SSIM LPIPS in Table 2?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    “Normal vectors for each 3D Gaussian are efficiently recalculated for new Gaussians per iteration using a custom CUDA kernel to accommodate the dynamic scene efficiently.” I wonder how this kernel is implemented.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please refer to the weakness part.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work integrates 3D Gaussian Splatting (3D-GS) as a post-processing step for RNN-based Simultaneous Localization and Mapping (RNN-SLAM). It achieves dense 3D reconstructions from the outputs of RNN-SLAM. From this perspective, approaches like EndoGSLAM, which directly employ 3D-GS as a representation method, seem more suited to this task. I am looking forward to a comparison between the two. The advantages in speed and quality primarily stem from the inherent characteristics of 3D-GS itself, rather than from the method proposed here. Therefore, I believe that the experimental section should focus more on comparing different implementations of 3D-GS. If the authors cannot articulate the advantages of their method over approaches that directly based on 3D-GS to conduct SLAM , I would regrettably recommend rejection.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thanks to the rebuttal, I have changed my mind. Although I still believe that the end-to-end method is the future of neural representation embedded into clinical usage, it still needs much effort to achieve similar performance to the current design of the Gaussian pancake. I recommend citing some end-to-end medical SLAM methods for the promotion of this field. Besides, the drawback of directly using the RNNSLAM should be addressed in the manuscript as well as the contents in the rebuttal. Finally, I’m leaning towards accepting this paper.



Review #2

  • Please describe the contribution of the paper

    A method called Gaussian Pancake has been proposed, combining the RNNSLAM method with the 3D Gaussian Splatting method improves the geometric consistency of endoscopic image reconstruction based on RGB. The video in the supplementary material demonstrates a good reconstruction effect.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The Gaussian Pancake method utilizes Recurrent Neural Network Simultaneous Localization and Mapping (RNNSLAM) to provide depth maps for 3D ground truth, which can be used to create sparse RGB point clouds. Authors use the refined depth maps, camera poses and sparsely sampled points on converged meshes of RNNSLAM to provide high quality texture renderings of the structures.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1)Insufficient analysis of the unique problems and difficulties faced in applying 3D Gaussian to endoscope three-dimensional reconstruction. Authors should analyze more of the issues faced by 3D Gaussian in the field of endoscopic 3D reconstruction. 2)What is an i7 single core CPU? It seems like there is no single-core CPU in the Intel Core i7 series. 3)I hope authors will place more experimental data in the main text rather than in the supplementary materials.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    I encourage authors to open-source the code to improve the replicability of the article.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Analysis of the challenges encountered in adding 3D GS in endoscope reconstruction. Adding user studies involving ratings and evaluations from clinical physicians can make the research more comprehensive and convincing.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The author proposed 3D pancake. The combination of 3D Gaussian and RNNSLAM achieved good reconstruction results, improving the geometric accuracy of reconstruction. However, we not only want to see good reconstruction results, but also want to see an analysis of the challenges faced by 3D GS in endoscopic 3D reconstruction.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a novel approach for accurate 3D reconstruction of colonic surface, leveraging 3DGS and RNNSLAM. Novel geometric and depth regularizations are proposed to improve the performance of 3DGS. Experiments have demonstrated good reconstruction performance that is smooth and with fewer artifacts.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Good overall quality, good writing with extensive experiments.
    • Introduce GS + SLAM as an efficient way for endoscopic reconstruction. (Several concurrent papers also leverage deformable GS for a slightly different task - deformable tissue reconstruction from endoscopy videos.)
    • Propose geometric and depth regularizations based on 3DGS, improving the performance.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • 3DGS optimization relies heavily on camera poses, depth maps generated from RNNSLAM, but the paper doesn’t discuss the accuracy of predicted camera poses and depth maps. It’s unclear how this would affect reconstruction. From the last sentence in Section 3.1, the paper only “choose three datasets with satisfactory RNNSLAM performance”, seems the method may not be robust enough, may fail if RNNSLAM doesn’t work well. Need more details - what accuracy is satisfactory enough?
    • The motivation for additionally using GS on top of RNNSLAM is unclear. RNNSLAM already provides real-time surface reconstruction with decent quality. The paper claims RNNSLAM lacks photorealism and anatomical details, but does not provide a comparison of reconstructed surface of RNNSLAM and their proposed method. It’s also unclear how to extract mesh from the trained Gaussians.
    • Lack of discussion in ablation studies, on why sometimes the full method has a performance drop in terms of PSNR (Supplementary Table A.1).
    • Lack of experiments on real endoscopy videos, currently cannot deal with deformable tissues.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    See above

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper provides a solution for efficient and accurate endoscopic reconstruction with GS, yet needs to clarify the points listed in weaknesses.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thanks to the rebuttal. The paper is inspiring for endoscopic applications, but the primary concern remains the robustness (camera poses rely heavily on RNNSLAM, without iterative optimization with geometry updates), which is very important for practical applications. The questions about RNNSLAM dependence are still not well answered, authors need to directly clarify in Sec 3.1, what performance of RNNSLAM is not satisfactory for the proposed method, including the frequency and typical scenarios of failure cases. Although the supplementary video is helpful, I encourage the authors to show at least some qualitative comparisons in Sec 3.3. Adding a direct visual comparison between RNNSLAM and the proposed approach in the video could highlight the improvement and better illustrate how the proposed method achieves more photorealistic textures. Please also comprehensively incorporate the rebuttal to the manuscript.




Author Feedback

We thank the reviewers for their valuable feedback.

Contributions & Analysis of Challenges (R1, R4):

Our key contribution is the geometric and depth regularizations for improving and adapting Gaussian Splatting (GS) for monocular endoscopic images (Fig.1a-b; pg. 2, para. 2). These regularizations mitigate common issues (floating artifacts, surface irregularities) in GS-based endoscopic reconstructions (sec2.3). The proposed regularizations align Gaussians with colon surface topology, enhancing geometric and anatomical accuracy (suppl. video 3:45-5:30min). Additionally, we proposed SLAM integration with GS by utilizing RNNSLAM[16] which has previously shown to be robust for monocular endoscopic pose/depth estimations. We emphasize that our method is designed to complement any robust depth estimation system[4].

Comparison with Other Methods (R3, R4):

Detailed comparison was performed with 3 SOTA methods closest to our method, at the time there were no comparable GS methods. EndoGSSLAM was unpublished at this submission. It is worth mentioning that EndoGSLAM relies on depth maps, so RNNSLAM or similar would still be required. Moreover, EndoGSSLAM doesn’t perform normal regularization, so our proposed Pancaking could improve its performance.

RNNSLAM Dependence & Geometry-Pose Joint Optimization (R3,R4):

We chose RNNSLAM for its rapid and reliable depth/pose estimations, effective in handling textureless challenges compared to traditional SfM methods, which often failed to generate sparse point clouds(see Fig.1C). Details of RNNSLAM’s capabilities are discussed in [16] and referenced in our paper. Depth/pose estimation in such environments remains an open research area.

We acknowledge R3 suggestion of end-to-end GS and camera pose using photometric loss. However, given the challenges in 3D reconstruction from monocular endoscopic images, this would require making significant design choices, parameter tuning and ablation studies, shifting the focus of our method. Hence, it is not in the scope of this paper.

Normal Loss Evaluation (R3):

We used cos angle difference loss due to the effectiveness in aligning Gaussians with surface topology, yielding more realistic geometric reconstructions than isotropic gaussians. We refer to 2D GS (SIGGRAPH 2024), published after this submission, that corroborates that using a normal loss for surface alignment is an effective regularization.

Direct Comparison with RNNSLAM[16] (R4):

We verified through qualitative comparison that GS textures are more photorealistic than those generated by [16]. This outcome aligns with expectations, as [16] is not optimized for rendering (images to be added in camera-ready).

Computational Efficiency & Reproducibility (R1, R3):

Our method reduces training to ~2 min, rendering >100x faster, compared to NeRF(noted by R1). Regarding R3’s concerns, RNNSLAM produces depths/poses at 10 FPS(sec.3.2), faster than traditional SfM methods used in NeRF, which often requires hrs. The compared methods and most GS methods equally require SfM pre-processing, so typically, these preparatory steps are not reflected in performance evaluations. For fairness, we haven’t included the minute additional time required by RNNSLAM. We will release our source code, including the CUDA kernel for normal vector calculation.

Evaluation Details (R3,R4):

Table 2 metrics may not fully capture artifact reduction & geometric accuracy. While metrics for GS vs GS+Pancaking seem comparable (SSIM: 0.8346 vs. 0.8340, LPIPS: 0.2047 vs. 0.2115), true improvements are evident in the suppl. video 4:43-5:40min and Table A.1. Addressing R4’s concern regarding the slight drop in PSNR, such variations are expected when geometric losses are added, shifting focus towards geometric accuracy. We ensured that eval. metrics like SSIM were used for direct comparisons where possible (code was unavailable for COLONNerf).

Minor (R3):

One dataset was real colonoscopy videos (suppl. video 4:24-4:43min).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Gaussian Pancakes combines RNNSLAM with a regularized 3D Gaussian Splatting to increase the quality of novel view synthesis and 3D reconstructions in endoscopy. The evaluations across three varied datasets show that it achieves superior rendering quality, smoother reconstructions, fewer artifacts, and greatly reduces computational cost. Additionally, the method outputs explicit geometry, which sets it apart from the other leading methods with its potential for seamless integration into clinical practices. After rebuttal, there is consensus amongst reviewers –a weak accept. After careful consideration of the authors’ rebuttal, I also lean towards accepting the paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Gaussian Pancakes combines RNNSLAM with a regularized 3D Gaussian Splatting to increase the quality of novel view synthesis and 3D reconstructions in endoscopy. The evaluations across three varied datasets show that it achieves superior rendering quality, smoother reconstructions, fewer artifacts, and greatly reduces computational cost. Additionally, the method outputs explicit geometry, which sets it apart from the other leading methods with its potential for seamless integration into clinical practices. After rebuttal, there is consensus amongst reviewers –a weak accept. After careful consideration of the authors’ rebuttal, I also lean towards accepting the paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top