List of Papers Browse by Subject Areas Author List
Abstract
Endoluminal endoscopic procedures are essential for diagnosing colorectal cancer and other severe conditions in the digestive tract, urogenital system, and airways. 3D reconstruction and novel-view synthesis from endoscopic images are promising tools for enhancing diagnosis. Moreover, integrating physiological deformations and interaction with the endoscope enables the development of simulation tools from real video data. However, constrained camera trajectories and view-dependent lighting create artifacts, leading to inaccurate or overfitted reconstructions. We present PR-ENDO, a novel 3D reconstruction framework leveraging the unique property of endoscopic imaging, where a single light source is closely aligned with the camera. Our method separates light effects from tissue properties. PR-ENDO enhances 3D Gaussian Splatting with a physically based relightable model. We boost the traditional light transport formulation with a specialized MLP capturing complex light-related effects while ensuring reduced artifacts and better generalization across novel views. PR-ENDO achieves superior reconstruction quality compared to baseline methods on both public and in-house datasets. Unlike existing approaches, PR-ENDO enables tissue modifications while preserving a physically accurate response to light, making it closer to real-world clinical use. Repository: https://github.com/SanoScience/PR-ENDO.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1025_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: https://papers.miccai.org/miccai-2025/supp/1025_supp.zip
Link to the Code Repository
https://github.com/SanoScience/PR-ENDO
Link to the Dataset(s)
https://zenodo.org/records/15732143
BibTex
@InProceedings{KalJoa_PRENDO_MICCAI2025,
author = { Kaleta, Joanna and Smolak-Dyżewska, Weronika and Malarz, Dawid and Dall’Alba, Diego and Korzeniowski, Przemysław and Spurek, Przemysław},
title = { { PR-ENDO: Physically Based Relightable Gaussian Splatting for Endoscopy } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15969},
month = {September},
page = {390 -- 400}
}
Reviews
Review #1
- Please describe the contribution of the paper
The idea of the paper is to extend Gaussian Splatting reconstruction of endoscopic (colon) videos by a simple BRDF model for Gaussians, assuming a light source at camera position. Evaluation shows superior reconstruction results of RGB test frames.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Simple, straightforward idea
- Good reconstruction results
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- simple straightforward idea, some parts that are more complicated (diffuse prediction, hash grid) remained unclear to me
- The description is partly imprecise
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The overall idea of the paper is straightforward: starting with a Gaussian reconstruction, add a simple reflection and light source model to better reconstruct RGB frames. However, several parts remained unclear to me:
- Lighting model: according to (4), material has a diffuse and a specular color. Typically, the diffuse color is the “base color” and specular color is white. So I guess the base color $a$ is the diffuse color, the specular color of Equ. 4 is white and the roughness $r$ corresponds to $alpha$ in GGX. Is this correct? I do not know the “Schlick-Beckmann” model, please provide references. Ideally, one could leave away the very general formulas (4) and (5) and just write down the really used model.
- I am struggling with Equ. 6. First of all, each Gaussian has a diffuse component (according to the later text), so why do we need an MLP to model this again? Somehow, the MLP has to learn the diffuse component again. Or, even worse, it can now also bake the specular part into the diffuse MLP, so I do not see how it can separate these components. I would understand if the MLP somehow models the diffuse color of the Gaussians, but then no $a_i$ would be necessary, and the MLP would get the position as input, and not the normal.
- Equ. 7: here, attenuation is modeled explicitly, so I see no reason why $d_i$ is also passed to the MLP
- The HashGrid (paragraph after Equ. 7): How does it encode the camera position? And how is this fed into the MLP? I have no idea what is done here, and why this is done.
- Fig. 5: the image is decomposed into diffuse and specular, albedo is a side product
- “Optimization”: material and light properties are optimized. Which light properties? And what are the material properties? Does this include the normal? Obviously yes, because it is claimed that the resulting normals are improved. But the Gaussian positions are not improved, right? Why then a depth loss?
- “Diffuse loss”: here, the MLP is trained to reproduce the normal diffuse lighting, why then use an MLP?
- Caption of Fig. 1: “…relighting task”: the table does not show results for relighting tasks, this is only evaluated qualitatively.
- Fig. 6: why not compare the normals with ground truth, this would be more meaningful.
- “Reconstruction”: make clear that you only consider RGB reconstruction, but not geometry.
All in all, although the algorithm seems to follow reasonable ideas and the results are nice, I see quite some open questions, which makes me hesitant to recommend acceptance.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
Unfortunately, the rebuttal was only partially helpful. I know what a hash grid is, but I still do not know how and why it is supposed to encode the camera position (according to text and rebuttal). According to Fig. 1, it stores the diffuse color (which makes perfect sense), but this does not correspond to the text. And there are several such small things, which I could not reconstruct (as written down in the review). However, the authors promise to provide the code, so such things can be clarified by looking at the code. Overall, I still do not feel well to recommend acceptance of a paper, if I am not able to understand several components and their importance.
Review #2
- Please describe the contribution of the paper
The paper introduces PR-ENDO, a framework combining physically-based rendering with 3D Gaussian Splatting specifically designed for endoscopic videos. It separates lighting effects from tissue properties using a physically accurate relighting model enhanced by a diffuseMLP, capturing complex illumination interactions and enabling realistic relighting from novel views. The authors demonstrate improved reconstruction quality and reduced visual artifacts compared to existing approaches, effectively handling both viewpoint variations and anatomical modifications.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Proposes a physically-based relightable Gaussian Splatting model tailored for endoscopic scenes.
- Separates lighting effects from tissue properties, enabling adjustments to illumination and anatomical modifications.
- Show applicability to clinical visualization and simulation scenarios.
- the qualitative comparisions are impressive.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- weak baseline, GaussianPancakes just propose normal loss which is a very weak baseline and may not suitable for this task. In this case, the performance gain is not so aligned with the qualitative results.
- The endogslam is a SLAM system, this should be mentioned.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
the resolution is pretty low of fig1.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Endoluminal endoscopic procedures are indeed important for diagnosis, and the exploration of the pattern where a single light source is closely aligned with the camera is highly encouraged. I am quite fond of this concept, but the limited performance gain leaves me confused. Why is the LPIPS lower?
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Review #3
- Please describe the contribution of the paper
The authors propose a modification of 3D Gaussian Splatting (3DGS) to enhance novel view synthesis in endoluminal endoscopic scenes. Their main contribution is refining the light-transport model in a novel way that better decouples view-dependent appearance from geometry, mitigating common 3DGS issues such as viewpoint overfitting and floating artifacts. Experiments on the C3VD dataset demonstrate superior performance compared to the state-of-the-art, particularly when rendering novel viewpoints with large camera rotations.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The method is clearly described, with a well-defined presentation of the novelties introduced compared to the baseline 3DGS. The evaluation is conducted on the standard C3VD dataset, demonstrating improvements over previous methods both quantitatively and qualitatively.
Incorporating a complex BRDF that encodes both diffuse and specular effects appears to positively impact view synthesis for endoscopy. Previous approaches often disregarded co-located illumination and masked specular highlights, instead encapsulating these effects within the color of each Gaussian. Additionally, the introduction of a tissue consistency loss, while somewhat restrictive, seems appropriate for the endoscopic setting, where albedo and texture are typically similar.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Clinical feasibility could be negatively impacted by (1) errors in depth map estimation and (2) tissue deformation. The latter limits the method to quasi-rigid sections of the colon. Regarding (1), the current approach relies heavily on initial depth maps to reconstruct the correct 3D shape. However, most conventional hardware lacks depth perception capabilities. Additionally, in endoluminal cavities, parallax is limited, meaning that the accuracy of 3DGS reconstruction is largely dependent on the quality of the underlying depth estimation. This could pose a challenge when applying the method in real-world scenarios. The authors claim that the method “optionally” uses depth maps (Fig.1 caption). How does the system perform without them? Where do these depth maps come from? EndoGSLAM? If necessary, specify clearly in Table 1 which methods utilize no depth maps, predicted depth maps, or ground-truth depth maps.
The in-house RotateColon dataset is used for comparison, but there is little information on how it was constructed, and no sample images are provided. This lack of detail makes it difficult to interpret the results in Tables 1 and 2 meaningfully.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Some expressions in the equations are not straightforward to infer from the provided references. I suggest including the explicit formulas for att(d_i) and f_diffuse,Equation,i. Additionally, if possible, consider providing the expressions for L_Depth and L_Norm to improve clarity.
Furthermore, the RotateColon dataset appears to be missing, which affects the completeness of reproducibility.
In Equation (7), is “F_i” referring to “F0_i”?
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper is interesting and relevant to the field. While it may still be far from clinical feasibility due to the challenges outlined as weaknesses, I am inclined to accept it. However, before providing my final rating, I would like the authors to clarify the use of ground-truth depth in the reported metrics.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The results are very convincing. Clearly stating in the paper that they are obtained without depth/normal ground truth would make them stand out even more.
Author Feedback
[R1, R2, R3] We thank the reviewers for recognizing our work’s value and their thoughtful feedback. [R1] Optimized properties and components: Gaussians are initialized with a SLAM point cloud, then further optimized. We optimize colon geometry with Gaussians’ positions (xyz), rotations, scales and opacities, so depth loss is crucial. Trainable material properties: albedo (a), roughness (r) and Fresnel coeff (F0). Trainable light properties: intensity and attenuation coeffs. When the tissue is relit, we compute its appearance considering the diffuse part (depends on a, normal, light params) and the specular part (depends on normal, r, F0, view dir, light). Together, they form the final render (see Fig. 5). [R1] Schlick-Beckmann and other formulas: The paper with DOI:10.1111/1467-8659.1330233 introduces Schlick’s approximation of Beckmann distribution. We will add a reference. Yes, our roughness (r) corresponds to alpha from GGX. Note, we use Eq 4, 5 and 6 in our code. [R1] MLP role: We introduce MLP to predict the diffuse part (not albedo). We did ablation study w/o diffuseMLP (Tab.3) which proves MLP achieves superior results to standard graphic formula. We see diffuseMLP as a correction module, making the diffuse response more reliable even with partially imperfect geometry (diffuse depends strongly on normals). We also hypothesize it may, to some extent, capture more complex light effects, which we don’t model explicitly. We therefore pass distance d to light as an auxiliary input. We design two losses: a) Ldiffuse keeps MLP predictions close to the standard graphics formula, so predictions remain physically coherent (Tab.3 w/o Ldiffuse) b) Ltissue keeps albedo free of baked-in lighting (Fig. 7). [R1] HashGrid: HG is a state-of-the-art encoding strategy (alternative to eg. sinusoidal). It encodes camera position into a trainable vector. Introduced in InstantNGP DOI:10.1145/3528223.3530127, see their Fig. 3 for an explanation. [R1] GT for relighting and normals: C3VD does not provide subsets with different light/tissue settings, so relighting is evaluated qualitatively. We will add GT normals for C3VD. [R1] Reconstruction: We reconstruct both 3D colon geometry and RGB. [R2] Baseline: We are grateful the reviewer notices our strong qualitative results. We chose GaussianPancakes as a baseline since (1) it is currently a state-of-the-art tool for fast reconstruction of colon and (2) importantly, it introduces normal regularization, which is crucial for relighting tasks. [R2] EndoGSLAM: We will emphasize clearly in the paper that EndoGSLAM is a SLAM module. [R2] LPIPS: We agree LPIPS may be confusing, especially given our strong qualitative results. We include it as a standard metric in the field. The discrepancy may stem from LPIPS relying on a neural network not trained on medical data, as discussed in DOI:10.1007/s10278-025-01462-1. Interestingly EndoGSLAM, though mainly a SLAM module, achieved the best LPIPS score on C3VD despite significantly worse PSNR and SSIM. [R3] DEPTH MAPS: We acknowledge that the “optionality” of depth maps can be misleading. Our method depends on depth-based regularization to enhance geometric consistency. Some SLAM methods, such as RNNSLAM (used in GaussianPancakes but not open-sourced), estimate both depth and camera poses. Others, like EndoGSLAM (publicly available), require ground-truth (GT) depth. Although the C3VD dataset provides GT depth and camera parameters—making SLAM optional—we use EndoGSLAM to simulate real-world conditions where camera poses are unknown. In practice, RNNSLAM could replace EndoGSLAM by jointly estimating depth and pose. We will add that only EndoGSLAM, GaussianPancakes, and PR-ENDO leverage depth regularization. Methods without it often suffer from floating artifacts. [R3] RotateColon: We will release the RotateColon dataset publicly. It is based on a Blender colon model and is not included in the paper, as we focus on C3VD—a more realistic and widely used dataset.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Reject
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
From the review comments, the rebuttal leaves key aspects—such as how the hash grid encodes camera position—unclear. Due to unresolved ambiguities and difficulty understanding core components, this paper is difficult to accept.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A