Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Deformable tissue reconstruction in endoscopy is vital for surgery, yet current methods struggle with high-fidelity reconstruction of irreversible tissue deformations. To this end, we present D4Recon, a novel framework for real-time and high-fidelity endoscopic reconstruction, addressing crucial challenges in surgical applications. A Dual-stage Deformation modeling and a Dual-scale Depth guidance (D4) are proposed in a dynamic 3D Gaussian Splatting paradigm along with lightweight multi-layer perception (MLP) to model dynamics in endoscopic scenes. In the dual-stage deformation modeling, we introduce a spatial deformation model to correct multiview inconsistencies, accompanied by a temporal deformation model that accurately represents tissue distortion and dynamic tissue interaction with surgical tools in the reference frames. In the dual-scale depth guidance, we propose to balance local error correction with absolute depth consistency, enabling precise depth refinement while preserving fine-grained color accuracy. D4Recon generates accurate 3D reconstructions with superior PSNR, SSIM, and LPIPS scores, outperforming existing methods in terms of geometric coherence and photorealism with real-time rendering speed, as demonstrated by extensive experiments on diverse endoscopic datasets. Reconstruction videos are in the supplementary file.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1587_paper.pdf

SharedIt Link: https://rdcu.be/eHw06

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05114-1_16

Supplementary Material: https://papers.miccai.org/miccai-2025/supp/1587_supp.zip

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{BasHri_D4Recon_MICCAI2025,
        author = { Basak, Hritam AND Yin, Zhaozheng},
        title = { { D4Recon: Dual-stage Deformation and Dual-scale Depth Guidance for Endoscopic Reconstruction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15968},
        month = {September},
        page = {159 -- 169}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes an endoscopic reconstruction method based on the dynamic 3D Gaussian Splatting. To improve the performance and balance local error correction, the paper designs a dual-scale depth guidance as well as a spatial-temporal SDS loss. Experiments demonstrate the improvements with the designs.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

++ The paper tackles a challenging problem on endoscopic reconstruction.

++ Paper is well written. The experiments part is extensive.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

– In the Depth Guidance part, the paper proposes to directly assign a high opacity value to all Gaussians to render a “hard depth”. The motivation behind this operation is unclear. Furthermore, manually assigning parameters might also introduce training instability on the model. For example, different choices of parameter values might directly influence the performance.

– The SDS loss is known to suffer from training instability and lengthy optimization time, even on in-domain image or 3D data. This issue may be further exacerbated on endoscopic data, as pre-trained diffusion models are unlikely to have encountered such data during training. Therefore, directly distilling from pre-trained diffusion models using SDS loss may not be appropriate for the endoscopic domain.

– The proposed designs increase the training time and complexity for endoscopic reconstruction. It is suggested to compare the training time of the proposed method against baselines to completely evaluate the proposed designs on both the training efficiency and performance.

– The improvement of the proposed method is not significant compared to several previous approaches, such as the PSNR, SSIM and LPIPS metrics comparison with EH-SurGS on the EndoNeRF-Cutting and StereoMIS datasets in Table 1.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Although the paper tackles a challenging research problem on endoscopic reconstruction, the reviewer presents major concerns on the proposed designs of depth guidance and using SDS loss to distill from pretrained diffusion models. These designs increase the training time and do not show significant improvements. Therefore, the reviewer is leaning towards weak rejection and encourages the authors to response to the weaknesses.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

A method for improved Gaussian-Splatting based reconstruction of endoscopic RGB-D videos including instrument masks is presented. The method includes a deformation model, a special depth loss, as well as a diffusion model for deformation modeling. The results are impressive and show superior results in comparison to state-of-the-art methods.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- well designed method
- impressive results and thorough evaluation
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- parts of the description unclear
- deformation modeling using a diffusion model difficult to understand
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The presented approach seems to be well designed and technically sound, but the exposition should be improved. Overall, the paper is very dense (a problem of the page limit), and in particular Section 2.4 is very hard to understand without knowing prior work well.
- Section 2: Given an input…: where does D_t come from? Is it monoscopic, is it metric, how approximate? Where does M_t come from? In the videos it looks like the camera is static, but probably this is not the case? If not, are camera poses required?
- 2.2: “To address occlusions”: I guess that occlusion due to instruments is meant here, right?
- Equ (2) / refined point cloud: this is unclear to me. The text sounds like the camera is static, otherwise M_0(p) and M_t(p) show different points, so the algorithm doesn’t make sense. Or is p a 3D point? But how is (geometric) occlusion handled then? Much of the text makes more sense to me assuming a static camera, and the video mostly shows this case, but then no colon model as shown in the video would be obtained. So, to avoid confusion, please define precisely what your input is.
- Equ. (4): also confusing. What is the goal of this step? Why “primarily from the Gaussians nearest to the camera center”? In the formula, should ||\rho - \mu_i|| be replaced by G(||\rho - \mu_i||)?
- Equ. 5: again, I think that ||…|| should be replaced by G(||…||)
- 2.4: while I think I understood the ideas of 2.3 (with some doubts), I found 2.4 very hard to read if someone is not deep in the topic and knows prior work in this area well. In particular, it was not clear to me why Equ. (7) solves multiview consistency (obviously, no static camera…). What is the “canonical trajectory distribution”?
The evaluation seems to be thorough and consider rather diverse data sets and algorithms. The results in the video are very nice.

Overall, I think this paper presents a well-designed algorithm for an important problem, but the description is quite lacking and in parts confusing. Maybe the authors can clarify questions in the rebuttal.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The rebuttal has not changed my judgement. I still think the paper shows an interesting system, and despite some deficiencies in exposition and evaluation I think it can be accepted.

Review #3

Please describe the contribution of the paper

This paper introduces another Gaussian splatting method for endoscopic reconstruction, differing from previous approaches by use of MLP to capture dynamic tissue deformation over time. The authors introduce hard and soft depth guidance to improve depth robustness.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

-Excellent demonstration of reconstructions of dynamic scenes in supplemental material -Interesting and novel use of hard and soft depth guidance particularly tailored for Gaussian splatting
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

-Unclear where the proposed method is represented in Table 3. -Seems like the pretrained 2D diffusion model significantly contributes to performance but model specifics are not given (especially what it is pre-trained on).
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Interesting combination of Gaussian splatting + MLP dynamic scene representation, sufficient novelty in hard/soft depth constraints addressing the major problem of depth estimation in endoscopy, excellent results of dynamic reconstruction in supplemental video.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank the reviewers(R) for their comments(C). We’re encouraged that they find our work novel, sound & interesting(R1,R2), clear & informative(R1,R2,R3), impressive results(R1,R2,R3). Here we address their questions: R1C1:D_t, M_t, camera movement & pose: Dynamic datasets StereoMIS & EndoNeRF provide surgical tool masks M_t; EndoNeRF includes depth D_t, while for StereoMIS, depth is estimated using a stereo-transformer. For static datasets (Simulation, In-vivo, Phantom), we follow GPancake [3] & use RNNSLAM for depth & pose estimation; masks are unnecessary as no surgical tools are present

R1C2:Occlusion in Sec2.2: Yes, occlusions due to surgical instruments

R1C3:Refined Point Cloud (Eq.2), static/dynamic camera position: For surgical datasets, camera remains fixed across frames, allowing us generate refined point cloud using Eq.2. In contrast, for colonoscopy datasets—referred to as “static” scenes in our paper—the camera moves through the anatomy, enabling 3D reconstruction. In this setting, we substitute Sec2.2 with standard 3DGS (stated in Implementation Details), & Eq.2 is not used

R1C4:Eq.4&5: Eq.4 enforces geometric consistency by anchoring the positions of the Gaussian centers to the absolute depth prior, focusing on Gaussians nearest to camera, as, under high opacity, these dominate the rendered depth & best reflect the actual surface location. ∣∣ρ−μi∣∣ is the true 3D distance from camera to each Gaussian center; replacing it with G(∣∣ρ−μi∣∣) would be incorrect, as G() yields density, not depth value

R1C5:Sec.2.4 Multiview consistency is crucial even in static camera setups, as 4D reconstruction demands spatial coherence across all frames & views to ensure physically plausible geometry. Eq.7 addresses this by optimizing the spatial deformation field using SDS loss. This ensures that reconstruction is consistent & stable, benefiting both static & dynamic camera scenarios. Due to space limits, please refer to [19] for details of preliminary SDS formulation

R2C1:Our method is the last row of Tab.3. Described in Sec.3.2.

R2C2:ArSDM [5] is utilized as the pretrained diffusion network (mentioned in implementation details)

R3C1:Rationale behind hard depth: Assigning high opacity to all Gaussians for hard depth rendering is crucial for enforcing geometric consistency, as it ensures depth supervision acts directly on the nearest Gaussians along each ray-anchoring the reconstructed geometry to the absolute depth prior & mitigating artifacts from depth inaccuracies. This approach is technically justified (Tab.3 & Sec.3.2). The hyperparameters are not set manually, rather based on validation (could not be included in the draft due to space limits)

R3C2:Applicability/instability of SDS loss in endoscopy While SDS loss can be unstable with generic models, we mitigate this using ArSDM[5]-an endoscopy-specific diffusion model for better results. Tab.3 shows that spatial & temporal SDS yield stable optimization & clear reconstruction gains. These adaptations make SDS effective & suitable for endoscopic scenes

R3C3:Training time & complexity While our method introduces additional components for improved geometric & temporal modeling, we have thoroughly benchmarked training time & efficiency against SoTA. D4Recon takes 122 seconds per scene & real-time rendering at 336 FPS, which is on par with or faster than recent methods such as Deform3DGS (1 min/scene, 338 FPS) & Endo-4DGS (4 min/scene, 100 FPS), while delivering consistently superior reconstruction (Tab.1). Thus it offers a favorable trade-off between efficiacy & performance, supporting its practicality

R3C4:Marginal gain over baseline D4Recon consistently ranks first or second across all benchmarks, reflecting robustness in both photometric & geometric fidelity. We yield sharper boundary & greater temporal stability, crucial for intraoperative use (see supplementary videos). Tab.3 confirms contribution of each component & our real-time speed shows practicality

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

D4Recon: Dual-stage Deformation and Dual-scale Depth Guidance for Endoscopic Reconstruction

Author(s):