Abstract

Accurate reconstruction of deformable soft tissues from endoscopic stereo videos is essential to improve surgical navigation and automation in robot-assisted image-guided procedures. While recent Gaussian splatting techniques achieve real-time rendering with impressive results on endoscopic datasets, conventional 3D Gaussian splatting methods suffer from volumetric biases, leading to inaccuracies in 3D geometry and depth estimation. To overcome these limitations, we propose EndoPlanar, a novel deformable planar-based Gaussian splatting approach. By flattening volumetric Gaussians to a 2D plane, our method enables unbiased depth computation and normal map estimation, which are difficult to achieve with traditional ellipsoidal Gaussians. Furthermore, we introduce a regularization strategy for smooth planar-derived normal maps to refine surface quality. Additionally, we enhance model initialization using Gaussian mixture-based background segmentation, improving the representation of unseen objects and accelerating convergence. We evaluate EndoPlanar on two standard benchmarks, EndoNeRF and StereoMIS, demonstrating promising performance by outperforming all baselines in reconstruction quality with PSNR of 34.51 dB while maintaining real-time inference speeds of 307.5 FPS.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3722_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/ThatphumCpre/EndoPlanar

Link to the Dataset(s)

StereoMIS dataset: https://zenodo.org/records/7727692 EndoNeRF dataset: https://github.com/med-air/EndoNeRF

BibTex

@InProceedings{PaoTha_EndoPlanar_MICCAI2025,
        author = { Paonim, Thatphum and Sasnarukkit, Chayapon and Nupairoj, Natawut and Vateekul, Peerapon},
        title = { { EndoPlanar: Deformable Planar-based Gaussian Splatting for Surgical Scene Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15970},
        month = {September},
        page = {131 -- 140}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper aims to introduce planar Gaussian splatting approach for 3D reconstruction of deformable soft tissues in endoscopic stereo videos.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) Well-written paper, easy to follow. 2) Use of canonical Gaussian space is well described. 3) Section with optimisation and defromable planar-based Gaussian splatting is clearly described.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    There are some major weaknesses in the papers that i have outlined below: 1) The Gaussian space introduction is not new. It has been published in other works. 2) Marginal improvement only compared to Deform3DGS [16]. However there is increase in training and FPS for inference.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please see weaknesses

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Based on the rebuttal from the authors the paper can be accepted.



Review #2

  • Please describe the contribution of the paper

    This work presents a set of methods to improve rendering quality and surface accuracy in dynamic scene reconstruction from stereo endoscopic sequences. The concept of flattening Gaussians to a planar surface is borrowed from recent computer graphics state-of-the-art (e.g., 2DGS). Canonical Gaussians at t=0 are initialized via a newly proposed combination of existing methods MAPF and MOG2. Deformations of Planar Gaussians are modeled with the existing FDM method. Finally, a normal surface regularization is applied via neighboring pixels sampling (constructing a normal plane) from the depth map (obtained via stereo matching). Benchmarks against SOTA and ablation studies are performed on the EndoNeRF and StereoMIS datasets.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The manuscript is well written and it is of clear comprehension. The introduction reads smoothly and brings straight to the technical buildup on the previous and related work.

    Tables and figures are well formatted and the reader is provided with all the necessary information to evaluate the methods.

    Quantitative and qualitative results on the evaluation datasets show substantial improvement in surface modeling (e.g. visible comparing RMSE of the depth maps in Fig2).

    Normal surface reconstruction results (Fig.3) show similar results with heavier and slower NeRF-based methods and a visible improvement w.r.t. compared Gaussian-based methods, while keeping training and rendering times within the real-time range.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Even though the ablation study is well explained and discussed, visual comparisons are provided only for the L_normal component. I would have appreciated seeing an intuitive visualization also the difference in the canonical model initialization (e.g. point cloud for MAPF and MOGF). How can the reader otherwise weigh the significance of the contribution of MOGF?

    Furthermore, the reasoning behind the use of MOG2 background subtraction with foreground masking is now well explained. Having a foreground and a background is a hard assumption in endoscopy, since many surgical procedures involve massive interaction with the peritoneal wall which surrounds the whole cavity, so what is the operational meaning and clinical value of having an organ mask? If 4D reconstruction of a specific anatomical target is the goal of the proposed methods, then this should be made more explicit and introduced together with the appropriate clinical background and discussion. Additionally, from the manuscript is it not fully clear how the statistical background model is applied to the first frame and especially how is that supposed to detect subtle motions. Can the authors elaborate further? Also, which motions are we talking about there?

    It is not clear what is the effect of an unstable “ground-truth” depth input (e.g., from RAFT stereo matching), often characterized by spatial and temporal artifacts shown as large depth discontinuities (valleys, holes, oversmoothing) on the L_normal regularization term. Specifically referring to the sentences, “… we employ an edge-aware weight based on the ground-truth depth gradient.” and “This weighting reduces the penalty in regions with large depth gradients, thus preventing oversmoothing of sharp surface boundaries.” it seems quite of a fragile approach to base the surface regularization on well-known unstable pseudo-groundtruth as stereo matching. Can the authors elaborate on this point?

    Overall, the manuscript is well presented and executed, but it tastes more like a good integration of existing ingredients and marginal improvements to the SOTA, that a novel and revolutionary formulation. Could the authors argue the novelty of their contribution in a more critical way? Why are the contributions relevant, unique, and tailored to the specific domain of endoscopic surgery?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Regarding the experimental setup, why are the specification of the dataset sampling only provided for StereoMIS and not for EndoNeRF? Furthermore, “…we sampled 180-300 frame segments…” Why was the sampling number variable in this case? This lack of clarity also brings confusion when trying to interpret the actual training speed, which is not reported per-frame but per-scene, and this is not very rigorous if the “scene” size is not well-defined. Could the authors elaborate on this?

    Regarding Section 2.3, and specifically “…where Ri is an orthonormal basis and Si = diag(s1, s2, s3).” it is not immediately clear to the reader what s1 s2 s3 are. This should not be given for granted, but rather explicitly mentioned. What is the meaning of the second subscript in S_3,0? Is that the time t? Should that not be S_3(0) then?

    Regarding the hand-tuning of the lambda D parameter, “…we tuned λD on selected EndoNeRF video, increased λD until the depth metrics ceased to improve, then applied λD uniformly on all datasets.”, which EndoNeRF video was being selected? Why would that be representative and generalize to the rest of the dataset or to other datasets?

    Regarding the sentence “…where θ and σ are learnable parameters that control the center and width of the function.” the concepts of center and width of the function should be better contextualized.

    Regarding the sentence “… by unbiased depth rendering to obtain a color image, a depth map, and a normal map…” the word “unbiased” should be better contextualized or its meaning made explicit. Furthermore, why is unbiased rendering defined as “unbiased depth rendering” should that not just be rendering?

    Regarding Figure 2, the visual improvement of Our over EndoSurf is greater for the StereoMIS images than the EndoNeRF ones, while showing a smaller RMSE delta (-0.1 for StereoMIS vs. -0.2 for EndoNeRF). Can the authors elaborate on the discrepancy between the quantitative and qualitative evaluation?

    Regarding Figure 3, What is the difference between the first and second rows of the plot? Authors should clarify it in the figure or caption.

    Regarding the sentence “…more accurate initial sparse point clouds…” Is accurate the correct wording to express the increased density of the initial Gaussians?

    Regarding the ablations, how can the reader evaluate if the L_normal term actually “enhances surface fidelity” given that there is no ground-truth normal surface for either dataset?

    Regarding the conclusion sentence, “…Gaussians Fusion (MOGF) accelerates convergence …” where is it reported in the manuscript a convergence acceleration? How can this be quantified?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, I think the work is solid and brings tangible proofs of the advantage both quantitatively and qualitatively of introducing the proposed methods. I have, anyhow, recommended acceptance conditioned on rebuttal since I would appreciate receiving clarifications on the several aspects that I have highlighted above (clinical motivation, methods novelty, wording, and results relevance).

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Authors have provided clarification to many relevant points of my review. Metrics, benchmarks, and impact of the work have been explained and supported with sound technical justifications. I am satisfied with the rebuttal and therefore suggest for acceptance of the article.



Review #3

  • Please describe the contribution of the paper

    The authors present a novel planar-based 3DGS method for dynamic endoscopic reconstruction that achieves superior geometric accuracy and rendering quality compared to baselines.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The authors’ method achieves high scores across metrics. With training times under 100 seconds, it simultaneously guarantees high rendering speeds and delivers the best metric performance.
    2. The proposed Mixture of Gaussians Fusion (MOGF) provides better initialization, facilitating easier optimization.
    3. Compared to 2DGS, the authors’ proposed planar method doesn’t directly use the 2D plane for scene representation; instead, the authors encourage the model to make the smallest of the three scale factors as close to zero as possible, making it simpler, and its rendering implementation would be easier.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Planar representations like 2DGS can face a “degenerate” issue when a 2D Gaussian splat is viewed from a slanted angle (edge-on). How does this method solve the problem?

    2. The author proposes using 2D planar Gaussian primitives to improve geometric accuracy in dynamic scene reconstruction. While this approach shows promise, clarification is needed on how the method handles sharp geometric features such as edges, corners, or creases.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See the strengths and weaknesses. The article provides comprehensive content with all necessary components, demonstrates a degree of novelty, and achieves impressive results on the benchmark tests. Overall, the article is of good quality.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank all reviewers for their valuable feedback. We address all concerns and unclear details as follows.

  1. Regarding novelty (R1, R2): Our novelty lies in being the first to effectively address and mitigate Gaussian Splatting’s inherent geometric bias specifically for complex 4D endoscopic scenes. Existing methods are not directly applicable to this environment, and none have previously tackled this critical bias issue in this context (e.g., PGSR are designed for general-domain static scenes). Our integration also required overcoming novel challenges, identified in preliminary experiments, such as our modified scaling loss (Eq. 4) initially causing unstable optimization, which we subsequently resolved using small value initialization (Section 2.3, paragraph below Eq. 4).

  2. Regarding marginal improvements (R1, R2): PSNR and SSIM improvements seem marginal due to metric saturation in SOTA methods (Deform3DGS, Endo4DGS). However, our main contribution is the significantly improved geometric accuracy compared to Deform3DGS (SOTA)—depth RMSE drops from 2.41 to 1.64 (EndoNeRF) and 5.20 to 3.37 (StereoMIS). Fig. 3 shows smoother, less noisy meshes comparable to EndoSurf and superior to other 3DGS methods. While matching the speed of Deform3DGS (the fastest method), our approach delivers EndoSurf-level quality with a significantly lower training cost (1.5 minutes vs. 12 hours).

  3. Regarding MOGF Detail and Evaluation (R2): Our MOGF initialization accelerates Gaussian densification for 4D reconstruction by handling dynamic changes, not for semantic segmentation (e.g., organ masking). The “background” is the first frame’s scene, used to initialize Gaussians; MOGF then identifies “foreground” as new visual information (e.g., newly visible areas) in subsequent frames, avoiding slow densification “from emptiness” for these changing regions. MOGF uses MOG2 for more accurate identification of temporal pixel variations. While direct MOGF visualizations were not primary, its significance and convergence acceleration are shown in our ablation study (Table 2), where MOGF achieves superior metrics at fixed training iterations compared to MAPF.

  4. Regarding edge-aware weighting (R2): There are two possible options for edge-aware weighting: color gradients and depth gradients in this dataset. In endoscopy, color gradients are often unreliable due to texture noise, reflections, and tissue variability. Although not perfect, depth gradients provide a more robust and consistent cue for identifying geometric boundaries.

  5. Regarding experimental setup and training times (R2): EndoNeRF sequences are short, so we used them in full. For StereoMIS, we selected 180–300 frame content-driven segments that capture complete actions. Training times remain comparable, as each scene is trained with a fixed number of iterations, consistent with the baseline’s work.

  6. Regarding λD hand-tuning (R2): λD was tuned on a representative short EndoNeRF video (pulling) through detailed hyperparameter optimization. The selected parameters generalize well to StereoMIS, as both datasets belong to the same endoscopic domain.

  7. Regarding surface fidelity and evaluation discrepancy (R2): In the absence of ground-truth normals, we assess L_normal qualitatively (e.g., Fig. 3), where smoother meshes and fewer artifacts support its effectiveness. The RMSE in Table 2 may not fully capture this, as it excludes tool regions—areas not present in the ground truth but critical for surface quality.

  8. Regarding degenerate geometry and sharp features (R3): We use a depth objective (Eq. 9) and a surface-normal loss (Eq. 8). The depth objective guides reconstruction and triggers densification for sharp features. The surface-normal loss ensures depth continuity, preventing slant-angle issues, while its edge-aware weighting preserves sharpness and avoids oversmoothing.

  9. We will also address minor writing issues. All suggestions will be considered for final manuscript. (R2)




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top