Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Geometric reconstruction and SLAM with endoscopic images have advanced significantly in recent years. In most medical fields, monocular endoscopes are employed, and the algorithms used are typically adaptations of those designed for external environments, resulting in 3D reconstructions with an unknown scale factor.

For the first time, we propose a method to estimate the real metric scale of a 3D reconstruction from standard monocular endoscopic images, under unknown varying albedo, without relying on application-specific learned priors. Our fully model-based approach leverages the near-light sources embedded in endoscopes, positioned at a small but nonzero baseline from the camera, in combination with the inverse-square law of light attenuation, to accurately recover the metric scale from scratch. This enables the transformation of any endoscope into a metric device, which is crucial for applications such as measuring polyps, stenosis, or assessing the extent of diseased tissue.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1790_paper.pdf

SharedIt Link: https://rdcu.be/eHw3r

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05127-1_18

Supplementary Material: https://papers.miccai.org/miccai-2025/supp/1790_supp.zip

Link to the Code Repository

N/A

Link to the Dataset(s)

EndoMapper dataset: https://doi.org/10.7303/syn26707219

BibTex

@InProceedings{IraRaú_EndoMetric_MICCAI2025,
        author = { Iranzo, Raúl AND Batlle, Víctor M. AND Tardós, Juan D. AND Montiel, José M. M.},
        title = { { EndoMetric: Near-Light Monocular Metric Scale Estimation in Endoscopy } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15969},
        month = {September},
        page = {180 -- 190}
}

Reviews

Review #1

Please describe the contribution of the paper

The article proposes a method to recover the scale of geometric 3D reconstructions obtained by applying SLAM/SfM to image sequences acquired by a monocular, conventional endoscope. The method exploits photometric constraints arising from the fact that endoscopes have one or more punctual near-light sources that cause noticeable illumination decay in surfaces that are imaged at close range. The camera is calibrated both geometrically and radiometrically (camera response function and vignetting), the intensity of punctual light sources are constant and their positions are known with respect to camera, and the scene is assumed to have a single albedo. Given the sparse 3D reconstruction and camera poses, the method recovers scale, camera gain at each frame, and am initial constant encoding albedo and illumination intensity. The paper presents ablation studies in simulated endoscopic sequences, as well as experiments in real GI footage where the size of polyps is estimated.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper addresses a largely unsolved problem that is the one of finding the true metric scale of 3D reconstruction / inference performed from endoscopic video sequences. The determination of the size of polyps is an application but there are many others
- The concept or idea to solve the problem is interesting and largely new
- The experimental section, comprising ablations studies and tests in real images, is sufficiently persuasive about the method being an idea that worth to explore further.
- The companion video provides a nice overview of the article
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Missing references that limit the claims: There is relevant prior art in multi-view shape-from-shading in endoscopy under near lightning [a,b] that accomplishes “3D reconstruction with real metric scale from conventional monocular endoscope, solely based in physical principles”. These references should be added, the differences with respect to the submission explained, and some of the claims toned down (e.g. first sentence in conclusion).
- Correctness can be improved: It is stated in the intro that “Photometry is scale dependent due to two factors: … when the light source is positioned at a small, but NONZERO, baseline from optical center”. My understanding is that in general photometry is scale dependent whenever light is at finite distance such that illumination decays with distance. In abstract it is irrelevant if the punctual light source is coincident or displaced with respect to camera center (e.g. in [b] is assumed to be coincident). The need of a baseline b is specific to the formulation of section 3. The statement is misleading and should be removed
- Clarity can be improved: Section 3 is also misleading. I understand that the intention is to persuade the reader that near illumination breaks scale ambiguity, but in a first moment it made me fear that the method was going to assume the strong, irrealist constraints of fronto parallel surface and motion parallel to the surface. I suggest to remove the section or make clear upfront that this is not the model that will be latter considered in the derivations and experiments.
- Lack of information on limitations and required calibrations: My understanding is that the method only works under constant albedo, that camera must be geometrically and radiometrically calibrated, and that the position of line must be known. Although tehre are some references in literature [c,d] the endoscopic camera calibration and determination of position of light source is not trivial to accomplish. The paper should be more clear about limitations, requirements, and ways to meet those requirements.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
ADDITIONAL QUESTIONS
- Most papers on near-lightning in endoscopy consider an illumination model more complex than a simple punctual model (e.g. [a,b,5]). Why this is not needed in here and till which extent the Vignetting calibration is not absorbing part of this effect?
MISSING REFERENCES

[a] Wu, C., Narasimhan, S. G. & Jaramaz, B. A Multi-Image Shape-from-Shading Framework for Near-Lighting Perspective Endoscopes. International Journal of Computer Vision 86, 211–228 (2010). [b] Gonçalves, N., Roxo, D., Barreto, J. P. & Rodrigues, P. Perspective shape from shading for wide-FOV near-lighting endoscopes. Neurocomputing 150, 136–146 (2015). [c] Rodrigues, P. & Barreto, J. P. Single-Image Estimation of the Camera Response Function in Near-Lighting. in 1–9 (2015). [d] Stoyanov, D., Elson, D. & Yang, G.-Z. Illumination Position Estimation for 3D Soft-Tissue Reconstruction in Robotic Minimally Invasive Surgery. 2009 IEEERSJ Int. Conf. Intell. Robot. Syst. 2628–2633 (2009)
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I think the positive aspects surpass the negative points that can be fixed in the camera ready version. I would like to hear the author’s thoughts in the rebuttal as well a reply on the need / lack of need of modeling the anisotropic nature of illumination
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper presents an approach for determining the metric scale of 3D reconstructions from monocular endoscopic images. The proposed method is model-based, utilizing the near-light sources embedded within endoscopes to estimate the metric scale. By leveraging this technique, any monocular endoscope can be transformed into a reliable metric measurement device.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The strengths of the paper include: (1) The paper is well-written and easy to follow; (2) It addresses a relevant research topic for the community, offering an elegant and theoretically sound solution. (3) The promising experimental results on both synthetic and real data demonstrate the feasibility of recovering metric scale based on near-light field theory assumptions.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The negative aspects of the paper are: (1) In Section 4.2, the authors mention, “We assume a calibrated mobile camera with r fixed point light sources at known positions relative to the optical center …”. It would be valuable to include a brief discussion on how these parameters are calibrated and/or reference relevant literature for calibration methods. (2) The paper lacks runtime information, which is a significant factor for real-world applications. Additionally, the authors state that they perform an exhaustive search for the scale parameter λ over a logarithmic space Λ. It would be helpful to specify the scale range and the discretization steps used. (3) The authors note, “In this real-world setting, we use approximately 20 frames with COLMAP to reconstruct the 3D shape and estimate its metric scale.” The selection process for these 20 frames is not explained. Contiguous frames in a video might lack sufficient baseline for COLMAP to function effectively; further clarification would be interesting. (4) It is unclear how many point-light sources r were utilized in the EndoMapper experiments in Section 5.5. Additionally, the sensitivity of the formulation to both the number of light sources and their accurate positioning needs should be discussed. (5) While I acknowledge the experiment in Section 5.5 regarding real polyp measurements, which demonstrates that the proposed approach provides measurements close to those performed by endoscopists, I am curious about two aspects: (a) How does the accuracy vary in Section 5.5 when different sets of 20 frames are used? and (b) How precise is the approach in reconstructing the complete metric 3D environment?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I am optimistic about the paper, which addresses a significant problem - metric scale estimation from monocular endoscopic images - in an elegant and theoretically sound manner. However, I have some concerns, as mentioned earlier, that I would like the authors to address and provide feedback before I provide my final recommendation.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This paper describes a method to estimate the scale of 3D reconstructions obtained using multiview reconstruction methods like SLAM or SfM by leveraging the light sources embedded in endoscopes at a small, non-zero distance from the camera along with the inverse-square law of light attenuation. The method simultaneously estimates the metric scale of the reconstruction, the scene albedo, and camera gain relative to the first image in the sequence used for 3D reconstruction. Authors also present an initialization technique to avoid local minima and demonstrate their results on both simulation and real data.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The methods and justifications for methodological decisions are explained clearly in the manuscript making the paper very enjoyable to read. The study of the cost function with respect to camera distance from reconstructed surface ties nicely with the need for strong initialization. Finally, the results in both simulation and real datasets are compelling.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

While not a major weakness, it would have been interesting to compare scale estimates from reconstructions using SfM and some of the newer methods that show improvement over SfM. Since the authors recognize that the accuracy of multiview reconstruction has an impact on the estimated scale, as assessment of how much the scale estimate might improve using SOTA reconstruction methods could add value. This could be a topic of future exploration.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- It would be interesting to know both how long the non-linear optimization of Eq. 6 took to converge as well as how long the exhaustive search for \lambda during initialization took.
Typos:
- Section 4.3, sentence before Eq. 6: missing “by” - “This can be achieved by minimizing…”
- Section 5.2, last sentence: “The scale error raises..” should be “The scale error rises..”
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(6) Strong Accept — must be accepted due to excellence
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Very well explained and formulated, nice results, and no major weaknesses.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank the reviewers for their encouraging comments and for pointing out some aspects that need to be clarified.

Constant Albedo (R1):
Albedo is not assumed constant. As shown in eq. (4) and (5), each point has a different albedo \rho_i, that is estimated by the proposed method in eq. (6). We have made clear in the introduction that the method estimates per-point albedo. We have also changed \rho to \rho_i in section 3 to make this clear from the first equation.

Non-zero camera-light baseline (R1): Section 3 shows that a non-zero baseline is required to estimate scale in the simplest case. To make clear that a non-zero baseline is also required in the general case, we have added this explanation after eq. (5): “Note that, if all light-camera baselines $b_j$ were zero, $\lambda^2$ could be extracted from the denominator and all intensities would be proportional to $\rho_i’ / \lambda^2$, producing a fundamental ambiguity: you can multiply the scale by any constant $c$ just multiplying all albedos by $c^2$. So, also in the general case, a non-zero baseline is needed to break the ambiguity and estimate true scale and per-point albedo.”

Section 3 (R1): As suggested, we have made clear that this is a simple model to show some of the method’s properties, not the final one used by our method.

Camera-Light calibration (R1, R2): We have clarified that we use the calibration from the Endomapper dataset and that: “It uses a calibrated Olympus endoscope with three light sources. We interpret the provided light spread as vignetting, since both were jointly estimated, and we use isotropic light sources positioned according to the manufacturer’s datasheet.”

New references (R1): The references [c, d] suggested are not relevant to this work, as we are not doing camera-light calibration, but using the one available in the EndoMapper dataset.
We have added and discussed references [a, b]: “Near-light shape-from-shading was used in \cite{wu2010multi} to recover a metric-scale reconstruction from a single perspective image, but relying on strong assumptions of known light intensity and a constant, known albedo. Similar methods \cite{goncalves2015perspective,batlle2022photometric} assumed that the light source is located at the optical center, i.e., the baseline is zero. As a result, it becomes impossible to disambiguate scale from albedo. This yielded only up-to-scale reconstructions due to three unknown factors: illumination power, camera gain, and surface albedo. ”

Tone down claim (R1): To make our claim more clear, the first phrase in the conclusion now reads: “We have presented, for the first time, a method to obtain 3D reconstructions with real metric scale from a conventional monocular endoscope, \changed{under unknown varying albedo}, solely based on physical principles.” Same change has been made in the abstract.

Selection of 20 keyframes (R2) We have clarified that: “we use four seconds of video at 5 FPS—yielding a total of 20 non-contiguous frames—and process them with COLMAP to reconstruct the 3D shape, whose scale is computed by our method.”

Running time (R2, R3) We have added the requested information: “On an i7 10700k 3.8 GHz CPU, the running time is 45~s for COLMAP reconstruction, 15~s for the initial scale estimation (Python prototype) and 0.4~s for minimizing eq. (6) with Ceres. In future work, we will replace COLMAP with a real-time SLAM method, and optimize the initial scale estimation in C++. “

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

EndoMetric: Near-Light Monocular Metric Scale Estimation in Endoscopy

Author(s):