Abstract

Precise camera tracking, high-fidelity 3D tissue reconstruction, and real-time online visualization are critical for intrabody medical imaging devices such as endoscopes and capsule robots. However, existing SLAM (Simultaneous Localization and Mapping) methods often struggle to achieve both complete high-quality surgical field reconstruction and efficient computation, restricting their intraoperative applications among endoscopic surgeries. In this paper, we introduce EndoGSLAM, an efficient SLAM approach for endoscopic surgeries, which integrates streamlined Gaussian representation and differentiable rasterization to facilitate over 100 fps rendering speed during online camera tracking and tissue reconstructing. Extensive experiments show that EndoGSLAM achieves a better trade-off between intraoperative availability and reconstruction quality than traditional or neural SLAM approaches, showing tremendous potential for endoscopic surgeries.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1232_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1232_supp.zip

Link to the Code Repository

https://github.com/Loping151/EndoGSLAM

Link to the Dataset(s)

https://drive.google.com/drive/folders/1wT4cILcbf4TUlWlmK_wJPiIrZ2AqZ43W?usp=drive_link

BibTex

@InProceedings{Wan_EndoGSLAM_MICCAI2024,
        author = { Wang, Kailing and Yang, Chen and Wang, Yuehao and Li, Sikuang and Wang, Yan and Dou, Qi and Yang, Xiaokang and Shen, Wei},
        title = { { EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15006},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper “EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting” takes the novel 3D Gaussian Splatting (3DGS) technique from ACM Trans. Graphics (Jul 2023) for rasterization and evaluates its performance on endoscopic images. The goal is to achieve high quality, fast 3D reconstruction of anatomy imaged by an RGB-D endoscope. The paper compares reconstruction performance, tracking error, and speed against several existing methods and demonstrates promising performance for clinical use.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Strengths of the paper include:

    1. Porting a recent, promising 3D dense reconstruction algorithm into the medical space has the potential to enable new clinical applications. An example of this is reconstructing patient anatomy from RGB-D endoscope images on the fly, which could help clinicians perform navigation more effectively and identify features of clinical relevance more readily.

    2. Adaptation of a non-medical technology into the medical space requires certain adjustments based on domain-specific knowledge, and the paper uses such knowledge to optimize performance.

    3. Comprehensive evaluation of the present technique against several existing methods across a variety of relevant metrics allows the reader to understand the potential of the novel clinical application.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Weaknesses of the paper include:

    1. The use of 3DGS (ACM ToG, Jul 2023) is largely a direct application of an existing technique on a new dataset that happens to be clinical in nature. The paper uses a standard dataset, a positive in terms of reproducibility but imposes a greater emphasis on technical novelty and/or clinical translation.

    2. The adaptation of 3DGS to clinical scenarios is fairly mild, i.e., the reader does not gain much further insight into the clinical problem.

    3. The explanation of the 3DGS method is unclear as it reproduces equations from the original paper but does not describe them in enough detail to understand them without reading said paper.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Overall the present paper shows promising progress on the path towards improving clinical technology, providing clinicians with advanced tools that can help them perform their duties more effectively and providing enhanced information capabilities. In particular, the present paper achieves this by leveraging state of the art methods in dense 3D reconstruction of anatomical scenes by rasterizing a set of related but discrete 2D endoscopic images.

    The name of the novel method is 3D Gaussian Splatting, or 3DGS, and has shown such promising results in its original non-medical settings that it has quickly gained widespread popularity within a few months of publication (Jul 2023). It is thus a promising path to explore the viability of the method on medical data.

    Because the paper heavily relies on (1) a previously published algorithm and (2) open source datasets, attention is shifted towards other possible strengths of the paper including translation to clinical practice, system integration, or novel device/workflow. The present paper uses 3DGS primarily as a drop in replacement for prior algorithms and using the same interventional paradigm, and thus does not introduce any novel concepts with regard to workflow integration or usability.

    The additional work performed to adapt the non-medical 3DGS to a medical application is fairly light as well, involving the replacement of non-medical data with openly available medical data, and any adjustments made are based largely on observations one can make without clinical expertise. One example is in Section 2.1 (“Preliminaries and Initialization”), first paragraph, “lighting primarily moves with the camera in endoscopy” and Section 2.2 (“Camera Tracking”), first paragraph, “insufficient brightness and color in areas further away from the camera.” While the specific traits of medical data vs non-medical data are always interesting and relevant, these particular observations are readily made by a non-expert, i.e., most readers could have arrived at similar conclusions without reading the paper. What the reader would want to know, in the context of the present paper, is how the non-obvious details of the clinical process drives design decisions and tradeoffs. What are the PSNR, error, frame rate, speed, etc. measurements that are required for the clinical application to be successful?

    To emphasize this point more broadly, readers in this community would like to learn details about a clinical process that is not readily obvious, and furthermore how the paper understands these details, identifies the problem to be solved, and crafts a solution around it. One would like to understand /how/ online reconstruction would be used in a clinical setting, and how it can potentially improve clinical performance and or workflow. To understand these details, the paper can take a different approach and first target a specific clinical procedure, describe it in enough detail to highlight the process and problems involved, what improvements are possible with technology, and how authors propose to develop and/or leverage technologies to create those improvements.

    For this particular paper, which applies existing techniques to existing data, to have a stronger impact, one possible future direction is to take the work further towards clinical translation. This could be the use of a phantom, either purchased commercially or developed in house, so that an entire workflow can be proposed around the physically grounded experiment. Such an exercise would help authors clarify to themselves and to readers how the proposed approach can create clinical value, as opposed to presuming that existing clinical data can speak on the paper’s behalf. Indeed, authors do list in the final section that future work can involve accommodating deformation, which is bound to occur as the camera is moved within the body, and integration into existing systems. It would be very interesting to see how well an existing, promising algorithm integrates into a system to increase its capabilities.

    Another trait of this type of work is the balance between describing the existing method so that the reader has the context in front of them, versus referring to the original source, which requires additional effort but is more definitive. This is especially true when the existing technique, 3DGS, is relatively new and may not be widely known. The balance between detail and reference in the present paper straddles this challenge in a way that hits the disadvantages of both extremes.

    Specifically, much page space is dedicated to reproducing the equations of the method, yet not enough text exists to explain the rationale for the formulation, why it is constructed as such, and how it may or may not apply to the clinical setting. In other words, the reader must either already be familiar with the method or must read the original paper. Thus the present paper does not make the most of the limited space it has to present the ideas.

    Some examples of this include: Section 2.1 (“Preliminaries and Initialization”), first paragraph: “[3DGS] represents complex scenes with collections of [3DGS’s]” leaves it up to the reader to realize how these collections are created and placed. Later, “we employ a uniform scaling factor for all three dimensions to accelerate optimization” leaves it open to the reader to understand how this optimization is valid and how much optimization was needed in the first place. In other words, how do clinical requirements drive technical requirements? Second paragraph: “camera pose and intrinsic parameters” keeps the reader guessing until Section 3.1 (“Dataset and Evaluation Metrics”) what the experimental setup looks like.

    In Fig. 3 the columns of the figure should be labeled and the caption should describe the expected results vs. the displayed results.

    Experimental validation It is a positive decision to use a standardized dataset, in this case the Colonoscopy 3D Video Dataset. First and foremost the paper could be made much stronger by focusing on this particular clinical application and highlighting the improvements to existing practice that are proposed. It is not until Section 3.1 (“Dataset and Evaluation Metrics”) that the reader learns that colonoscopy video is being examined, leaving the reader with the impression that the clinical application is but an afterthought.

    The dataset includes 22 clips but the present paper evaluates against 10, begging the question of how and why said 10 were chosen.

    In Table 1 (“Quantitative results on the C3VD dataset”), the terms EndoGSLAM-H and -R should be defined; the table is difficult to interpret without reading the text body.

    I would be excited to see this work pursued further towards live phantom experiments and beyond, to really highlight the true clinical potential of this technical capability.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    On one hand it is interesting to see a promising dense 3D reconstruction technique originating from outside of the MICCAI community ported over to the clinical setting. On the other hand, the reader does not learn that which was not already known (that a 3D anatomical scene can be reconstructed from a series of discrete endoscopic images), and the paper does not bring the state of the art closer to clinical adoption. I would be excited to see this work pursued further towards a live phantom experiment and beyond.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    My review is most similar to R1’s, believing the paper’s…

    • Strengths: 3DGS for endoscopic SLAM is a timely and effective contribution to MICCAI themes
    • Weaknesses: Scientific merit and clinical impact

    These lead me towards a mid-level rating (weak accept or weak reject).

    Authors address weaknesses in the rebuttal as follows:

    (1) Scientific merit: Present paper is first to publish 3DGS for endoscopic SLAM. The method further enables camera tracking due to endoscopy-specific simplifications from the original 3DGS, as well as improvements in lighting and robustness.

    I agree that first-to-publish is necessary for acceptance, but not sufficient. 3DGS (Aug 2023) apparently supersedes NeRF (2020 + several improvements), yet SMERF possibly supersedes 3DGS just months later (Dec 2023). Med tech is developed, tested, and approved over a horizon of 5+ years so if we emphasize first-to-publish that iterate in months, we risk getting stuck in an innovation cycle driving MICCAI into irrelevance.

    (2) Clinical impact: Present method is fast, allowing real-time reconstruction/navigation.

    I agree with this assertion and its clinical significance. In my opinion, mitigating factors include: (a) This is potential clinical impact as actual impact is much further downstream. (b) In both paper and reviews we refer to endoscopy monolithically, but scarcely about colonoscopy specifically. Colonoscopy is vastly different from eg thoracoscopy, bronchoscopy, and laparoscopy. For us to say that the present method applies to endoscopy writ large shows a gap in clinical foresight that we should overcome. (c) Individual clinicians’ views are not the same as clinical consensus, so we should thoughtfully contextualize any off-the-cuff opinions.

    I agree with R5 that the present work has good experiments and presentation, but I believe these are necessary but not sufficient. I agree with authors’ rebuttal to R5 that EndoGSLAM is not a pretraining method.



Review #2

  • Please describe the contribution of the paper

    The authors presented an algorithm called EndoGSLAM, which is a SLAM approach targeting endoscopic surgeries. It integrates Gaussian representations and differentiable rasterization for efficient rendering speed (+100 fps) during online camera tracking and tissue reconstruction. The experimental results show good trade-off between efficiency and reconstruction quality when compared with existing approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strengths of the paper are: (1) the paper is, in general, simple to follow; (2) the authors provide a simple and concise literature review, discussing advantages and disadvantages of traditional sparse approaches, NeRFS and Gaussian Splatting; (3) Gaussian Splatting is an active research topic and its application for medical endoscope SLAM was, from the best of my knowledge, not yet tackled; (4) The experimental results in a public dataset are very positive, showing good trade-off between efficiency and reconstruction quality.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The negative aspects of the paper are: (1) from my understanding, the scientific contributions of the paper only correspond to minor improvements with respect to existing approaches:

    • From reading Section 2.1. I understand that the authors suggest the reduction of the gaussian parametrization by 86% (remove SH + uniform sampling);
    • [7] seems to be used for rendering the Gaussian representations (Equation 1-3);
    • Camera tracking is performed using standard depth and color reprojection loss (Equation 4);
    • Gaussian expansion uses empirical thresholds;
    • Partial refinement uses empirical thresholds and standard losses. (2) Not clear how the 10 clips from the 22 in C3VD were selected. (3) Not clear what is the performance degradation with respect to (full, standard) gaussian splatting. I understand it would be slower, but how much is the quality of rendering reduced after removing 86% of the parameters? (4) not clear how the proposed algorithms compares with existing gaussian splatting approaches [A] and [B] (both are not cited in the paper). [A] “EndoGaussian: Real-time Gaussian Splatting for Dynamic Endoscopic Scene Reconstruction” (https://github.com/yifliu3/EndoGaussian) [B] “Gaussian Splatting SLAM” (https://github.com/muskie82/MonoGS) (5) It is not clear from the paper if the performance results presented in Table 1 (image quality, 3D reconstruction error, absolute trajectory error) are meeting the quality required for a real medical setting.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) It would be interesting to evaluate the performance/over-head with respect to simple a alternative approach, which consists in combining ORB-SLAM (most efficient tracking and reconstruction) with Gaussian Splatting reconstruction. Regarding Gaussian Splatting you would only be optimizing reconstruction and rendering, and would assume relative motion between cameras fixed; (2) Please highlight the best results in Table 1 and Table 2; (3) Can you provide more details in the selection of the gray-scale intensity value for the pre-filtering M_t? (4) Would be relevant to have the individual results for each clip in the supplementary material for comparison with other approaches (e.g. [C,D]] [C] “LightNeuS: Neural Surface Reconstruction in Endoscopy using Illumination Decline”, MICCAI, 2023 [D] “LightDepth: Single-View Depth Self-Supervision from Illumination Decline”, CVPR, 2023

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of using Gaussian Splatting for tracking and 3D reconstruction in endoscopy is a very timely research topic and tackled in a simple yet effective way. However, there are still some concerns/doubts regarding the relevance of the scientific contributions, performance comparison with other approaches and, performance for medical setting. If the authors tackle my concerns, by clearly highlighting their contributions, discussing the alternatives, and provide some more information about the requirements (and how far the proposed approach is from these requirement) for a real medical setting, then I am willing to accept the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The article describes a pipeline for Self-Location and Mapping (SLAM) from RGB+D endoscopic sequences where the radiance field is modeled using the recent framework of 3D Gaussian Splatting (3DGS) to simultaneously accomplish accurate camera tracking, high-fidelity 3D reconstruction and real-time on-line visualization which can be of particular usefulness for clinical diagnosis. The 3DGS framework is specialised to endoscopic scenes by exclusively by considering isotropic kernels (typically there are no elongated structures in scene), by replacing spherical harmonics by a simple color parameter (light moves with camera so no complex view-dependent light effects), and by masking unreliable pixels (either too dark or specular). The experimental results in phantom show that EndoGSLAN presents a good trade-off between time and accuracy beating all competing methods in what concerns online rendering speed.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper pioneers the use of 3D Gaussian Splatting (3DGS) in the context of endoscopy introducing the MICCAI audience to a very recent trend in radiance fields and SLAM

    • Although 3DGS has already been used in SLAM the paper makes adaptations and takes into account specificities of endoscopy that are of interest (isotropic gaussian, discard use of SH, reliable pixel mask, etc)

    • The experimental results are encouraging and suggest that EndoGSLAM presents a good trade-off between time and accuracy beating all competing methods in what concerns online rendering speed.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The approach seems to require a camera with a depth channel (RGB+D) that does not exist in the field. Thus this is a largely exploratory work without immediate clinical application. I would expect this to be discussed in the paper with an analysis of the importance of including the depth channel for the performance of the proposed formulation.

    • The paper is in general clear except for the mask of reliable/unreliable pixels and the partial refinements od section 2.4 (see below)

    • The experimental part could be more thorough by studying the positive and negative impacts of considering isotropic gaussians without SH.

    • There is an important missing reference that limits the novelty of the current submission and that works with monocular RGB being potentially usable in the field in the short term [a] Matsuki, H., Murai, R., Kelly, P. H. J. & Davison, A. J. Gaussian Splatting SLAM. arXiv (2023) doi:10.48550/arxiv.2312.06741.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In section 2.2 the rationale for the classification of pixels into reliable and unreliable is not clear and/or seems somewhat naive. My understanding is that the only criterium for a pixel to be unreliable is to have brightness outside the range ]0.1;0.9[. This enables to discard poorly illuminated regions and regions of specularity/highlight, but it can also discard reliable pixels (e.g. very white surface) and misses other “unreliable” situations such as non lambertian reflection.

    In section 2.4 what is the criteria for a frame to be a keyframe? Is it just to be at a k distance from last key frame? What is exactly the probability pc that is assigned to the current frame? The role of the distribution function of equation 5 is also not fully clear. Is the optimization with then loss of equation 6 run in the entire image or only in the expansion set of section 2.3?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    All weighted I lean towards accepting the paper based on the fact that is the first work that specializes GS to be used for SLAM in endoscopy. The authors should discuss the possibility of applying the framework to monocular RGB for deployment in the field and include the very recent reference [a] that accomplishes GS SLAM without requiring a depth channel. Clarifications to que questions in 10. above will also be welcome

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors reply to most of my questions was satisfactory. I definitely lean towards acceptance



Review #4

  • Please describe the contribution of the paper

    The authors propose EndoGSLAM a novel approach designed for 3D reconstruction of endoscopic procedures. The proposed framework which based on Gaussian representations is able to perform precise online camera tracking, high-quality dense reconstruction, and real-time novel view synthesis. It exploits differentiable rasterization to facilitate fast optimization and rendering. The method makles use of a dense photometric losss for real-time tracking and reconstruction instead of sparcce geometric features. The proposal also includes a refining modules for partially filling missing regions, speedding up the reconstruction process significantly.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The problem is important and clinically relevant in order to improve the outcomes of 3D reconstruction and tracking for use in applications such as minimally invasive surgery.

    2. The problem and proposed approach are well motivated and Figure 1 provides a very good general overview of the results obtained by the method compared to those in the state of the art.

    3. The state of the art is comprenhensive enough: it provides a good understanding of the limitations of previous approaches and positions the use of Gaussian Splatting within this field as an area of opportunity for creating online dense optical tracking and reconstruction methods.

    4. The experimental design and the test done by the authors are very thorough. They performed experiments on a large dataset and compared their method with state of the art models. The study also includes ablation studies to assess the impact of pre-filtering M and the use of the keyframe-based refining strategy. They also show resuls for a real-time and a a high-quality implementaitons.

    5. The results are well presented and the authors present both quantitative and qualitative assessments of the experiments, presenting ablation studies in certain detail.

    6. The authors discuss future areas of research and the limitations of the current iteration of their approach.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The abstract is a bit lacking, it does introduce the problem but the solution is not clear enough nor the obtained results.
    2. The authoor could delve a bit deeper in the analysis and perform more statistical tests.
    3. The authors could report results on an unseen dataset?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper makes use of the publicly available dataset. If the code is made available it could be easily reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The abstract needs to be rewritten an improve to discuss the overall gains of the proposed method.

    2. The results section is using trajectories track information (Figure 3). Why not to report metrics sushc as Absolute Pose Error. Some of the metrics are not well defined, even if the citations are provided (i.e., LIPS)

    3. There are some minot typos: reconstructing (2nd paragraph), are proposed to integrate nerf (2nd page)  have proposed…, and some more. I would suggest re-reading the paper. But overall the paper is easy to read and well-written

    4. Table 1 should have best metrics in bold to make it easier to parse. Have the authors consided using a box-plot for bettter comparison of the different methods? Maybe a statistical test complementing this could make the contribution stronger.

    5. How does the method deal with under and over-exposed frames? There is a lot of research in this right now and the authors mention very briefly or maybe am I wrong? (at the beginning of section 2.1).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is very well-written, the experiments are extensive and the results are consequential. There are a few areas of improvement, but overall I think it is a strong paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I maintain my original review for the paper if the authors address my comments as discussed in their rebuttal




Author Feedback

We appreciate the reviewers’ constructive feedback and efforts. Below is our response. All updates will be included in the final submission.

  1. Clinical contribution&insight(R1,R4): Previous clinical SLAMs, mentioned in Sec.1, provide only sparse representation or are too slow for real-time medical use. Our method uses 3DGS for fast rendering and optimization, marking the first to allow surgeons to navigate the scene during procedures. Our team’s surgeons confirmed that EndoGSLAM’s online reconstruction can be used in existing systems for interactive surgical displays by simply adding a desktop, enhancing field of view and observational capabilities, and thereby improving clinical performance and workflow.
  2. Scientific novelty(R1,R3,R4): Related works mentioned include EndoGaussian and MonoGS. The former does not support camera tracking, and the publish timing of the latter on arXiv is close to ours. Compared with original 3DGS, our method enables camera tracking and runs at 9 fps. Compared to existing GS-SLAMs, our method applies (1)simplification, removing view-dependent params since endoscopies have only a limited view angle(details in next para). (2)filter M_t which addresses lighting issues in the surgical scene at a low cost while adding robustness to GS-based tracking. Thus, our method brings 3DGS closer to clinical use. As the first to specialize 3DGS for SLAM in endoscopy, our method provides new insights to the community about why and how 3DGS should be used in the clinical field.
  3. Code&performance w.o. SH simplification(R1,R3): We will make our code public in the future with per-clip results for easy comparison. Performance w.o. Simplification is slower and less stable, with an 84% degradation in ATE and approximately 1.44x slower execution. The lack of simplification leads to unwanted view-dependent color change, complicating tracking and causing color artifacts in visualization. Simplification, combined with the filter M_t, enhances EndoGSLAM’s speed and robustness.
  4. Implement Details: Clip selection(R1,R3): we first removed clips that are too short (trans_t1_a only 2 seconds). Then we kept only one from clips that are very similar to each other. The remaining ones are long enough for SLAM and representative; RGB-D camera(R3): Depth can be acquired from stereo cameras that, we’ve confirmed with clinicians, are now widely used in endoscopic surgeries. We also consider depth estimation as future work; Keyframe criteria(R3): we select every kth frame into a list, and the probability is calculated to estimate spatial-temporal distance from all keyframes to the current camera position; Whether the results meet requirements of real medical setting(R1,R4): we confirmed with clinicians that the visualization results are considered helpful; On unseen dataset(R5): EndoGSLAM is not a pretraining method. It can be applied to any new data; Dealing with under/over exposure(R5): It’s another separate task, but we have plan to integrate such methods into the system.
  5. Discussion&future work(R1,R4,R5): While integrating ORB-SLAM for tracking and 3DGS for rendering is possible, it complicates the system. ORB-SLAM relies on sparse geometric features, and it make tracking and mapping separate. Instead, our approach can allow correcting previous tracking results by rendering at a historical timestamp and using joint optimization, thereby reducing global tracking error and makes the whole system more robust than using direct image alignment[NeRF-SLAM,IROS2023]. Our method runs end-to-end as a standalone clinical visualization tool. Given its similarity to point clouds, 3DGS shows promise for deformable tissue tracking and reconstruction. (R4)Future plans also include using live phantom and material point method for further simulations.
  6. Other questions(R1,R3,R4,R5): We will carefully address the reviewers’ comments, thoroughly revise the manuscript for clarity and organization, and ensure all relevant papers are properly cited.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The rebuttal has provided sufficient details to answer reviewers question. Although one of reviewers still leans to weak-reject this submission, I do believe this is a nice contribution to MICCAI, and therefore I recommend an acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The rebuttal has provided sufficient details to answer reviewers question. Although one of reviewers still leans to weak-reject this submission, I do believe this is a nice contribution to MICCAI, and therefore I recommend an acceptance.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This submission had mixed reviews: R1 - weak accept, R3 - accept, R4 - weak reject, and R5 - accept. While the majority is therefore in favour, there is a clear agreement among reviewers R1 and R4 of limited methodological novelty. I completely agree with these concerns, since this work is essentially a porting of the current hot method in SLAM/SfM (Gaussian Splatting) to reconstruct endoscopic scenes, with small adaptations that are rather obvious, highlighted by R4. While the results are undoubtedly impressive, I am hesitant to recommend acceptance, since as R4 states, applying the latest tech from the general vision / graphics literature to MICCAI tasks / datasets, results in a cycle of many irrelevant MICCAI papers in the grand scheme of things. Nevertheless, due to the strong majority favouring acceptance, I would go with the reviewers, and hope that the work may serve as a step towards to the clinical testing and translation of SLAM systems in endoscopy.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This submission had mixed reviews: R1 - weak accept, R3 - accept, R4 - weak reject, and R5 - accept. While the majority is therefore in favour, there is a clear agreement among reviewers R1 and R4 of limited methodological novelty. I completely agree with these concerns, since this work is essentially a porting of the current hot method in SLAM/SfM (Gaussian Splatting) to reconstruct endoscopic scenes, with small adaptations that are rather obvious, highlighted by R4. While the results are undoubtedly impressive, I am hesitant to recommend acceptance, since as R4 states, applying the latest tech from the general vision / graphics literature to MICCAI tasks / datasets, results in a cycle of many irrelevant MICCAI papers in the grand scheme of things. Nevertheless, due to the strong majority favouring acceptance, I would go with the reviewers, and hope that the work may serve as a step towards to the clinical testing and translation of SLAM systems in endoscopy.



back to top