Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Forecasting surgical instrument trajectories and predicting the next surgical action recently started to attract attention from the research community. Both these tasks are crucial for automation and assistance in endoscopy surgery. Given the safety-critical nature of these tasks, reliable uncertainty quantification is essential. Conformal prediction is a fast-growing and widely recognized framework for uncertainty estimation in machine learning and computer vision, offering distribution-free, theoretically valid prediction intervals. In this work, we explore the application of standard conformal prediction and conformalized quantile regression to estimate uncertainty in forecasting surgical instrument motion, i.e., predicting direction and magnitude of surgical instruments’ future motion. We analyze and compare their coverage and interval sizes, assessing the impact of multiple hypothesis testing and correction methods. Additionally, we show how these techniques can be employed to produce useful uncertainty heatmaps. To the best of our knowledge, this is the first study applying conformal prediction to surgical guidance, marking an initial step toward constructing principled prediction intervals with formal coverage guarantees in this domain.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0260_paper.pdf

SharedIt Link: https://rdcu.be/eHw02

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05114-1_12

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/salusanga/conformal_instrument_trajectory

Link to the Dataset(s)

N/A

BibTex

@InProceedings{SanSar_Conformal_MICCAI2025,
        author = { Sangalli, Sara AND Sarwin, Gary AND Erdil, Ertunc AND Serra, Carlo AND Carretta, Alessandro AND Staartjes, Victor AND Konukoglu, Ender},
        title = { { Conformal forecasting for surgical instrument trajectory } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15968},
        month = {September},
        page = {117 -- 127}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper explores the application of two machine learning algorithms, namely, the conformal prediction (CP) and the conformalized quantile regression (CQR), to forecast surgical instrument trajectories in endoscopic surgery. The authors adapt CP-based uncertainty quantification techniques—originally designed for general-purpose regression—to the domain of medical robotics, specifically forecasting the direction and magnitude of surgical tool motion. Notably, this study pioneers the application of conformal techniques within this specific context, yielding distribution-free, statistically valid prediction intervals (PIs) for real-time surgical guidance. Furthermore, the paper investigates multiple testing correction methods (e.g., Bonferroni, Sidak, Max-Rank) to address the limitations of independently estimated intervals. Through experiments on a pituitary surgery dataset, the authors demonstrate that CQR generally yields narrower and more stable PIs than standard CP and can produce uncertainty heatmaps that may be useful for surgical navigation systems.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This work explores the application of standard conformal prediction and conformalized quantile regression to estimate uncertainty in forecasting surgical instrument motion by analyzing and comparing their coverage and interval sizes, assessing the impact of multiple hypothesis testing and correction methods. It also demonstrated that independently constructed PIs fail to maintain valid joint coverage, as expected from hypothesis testing theory, and address this with multiple-testing corrections. They systematically compare split CP and conformalized quantile regression on a pituitary surgery dataset, analyzing coverage and PI widths, and demonstrate how these PIs enable uncertainty heatmaps with statistical guarantees. Notably, this is the first study applying conformal prediction to surgical guidance.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Despite the practical relevance, this work may lack novelty, as it largely implements existing conformal prediction techniques directly from prior work as cited in main submission [17], without proposing any algorithmic extensions, theoretical insights, or surgical-domain-specific adaptations. In the experiments part, although uncertainty heatmaps are proposed for interpretability, the paper does not include any qualitative assessment, user study, or clinical feedback to validate their usefulness in real-world scenarios, which makes the motivation unclear. Moreover, the study omits comparisons with other widely used uncertainty estimation baselines, which weakens the conviction of this work.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This work investigates the application of standard conformal prediction (CP) and conformalized quantile regression (CQR) to estimate uncertainty in forecasting surgical instrument motion. It provides a comparative analysis of their coverage properties and interval widths and further examines the role of multiple hypothesis testing corrections (e.g., Bonferroni, Sidak, Max-rank) to improve joint coverage validity. The authors systematically evaluate both split CP and CQR on a pituitary surgery dataset and demonstrate how the resulting prediction intervals can be visualized as uncertainty heatmaps with statistical coverage guarantees. Despite it being the first conformal prediction work in the surgical field, the work lacks methodological novelty. It largely reuses existing conformal prediction methods previously established in the literature (notably in [17]) without introducing new algorithmic innovations, theoretical insights, or domain-specific adaptations tailored to surgical settings. While the proposed uncertainty heatmaps are an interesting direction for interpretability, the paper does not provide any qualitative analysis, user study, or clinical feedback to validate their real-world usefulness, making the claimed motivation less convincing. In addition, the absence of comparisons with other widely adopted uncertainty estimation approaches weakens the empirical evaluation and limits the positioning of this study within the broader uncertainty quantification literature.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

Appreciate the contribution of this work, which is the first work to involve conformal forecasting in the surgical field. Its application value is clear, but due to its single scenario setting and lack of human evaluation, which greatly limits its practical value and significantly weakens its integrity as an application study.

Review #2

Please describe the contribution of the paper

The paper “Conformal Forecasting for Surgical Instrument Trajectory” proposes an uncertainty estimation technique for forecasting the trajectory of surgical instruments in endoscopic video frames. Specifically, the authors model uncertainty in the movement of the instrument’s bounding box center using polar coordinates. They address joint coverage issues through multiple-testing corrections, compare different conformal methods on a surgical dataset, and introduce uncertainty heatmaps with statistical guarantees.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper clearly formulates and describes the data and methodology. It ensures proper data stratification to prevent patient-level leakage across object detection, trajectory forecasting, and calibration tasks. An extensive ablation study is conducted to evaluate different variations of the proposed method.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The paper lacks a visualization of the network architecture, and the process for generating uncertainty heatmaps is not clearly explained. Additionally, there is no comparison with other existing uncertainty estimation methods. Please refer to the detailed comments.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- Although the calculation of the angle and length prediction intervals (PIs) is clearly explained, the process of converting these into a heatmap is not entirely clear. I assume the PIs (regardless of the method) are computed for different alpha values and then aggregated? If so, the yellow regions in the heatmap might correspond to lower coverage, since higher target coverage (indicating more certainty) increases the size of the PI. Please clarify the process of heatmap generation, especially as this visualization is one of the main contributions of the paper. In addition, please include a color bar in Figure 2 with annotations indicating the uncertainty values corresponding to each color.
- Since the primary contribution of the paper is uncertainty estimation, it is important to clarify why the authors did not include baseline comparisons with other established uncertainty estimation methods. Although temperature scaling, Monte Carlo Dropout, and Deep Ensembles are mentioned in the introduction as general approaches, it remains unclear why these methods are not applicable or were not evaluated for this task.
- In Figure 1, could you clarify why the ground truth (GT) vector in panel (a) differs from those in panels (b) and (c)? It might be clearer to separate the examples and show two distinct cases: one where the GT lies within the PI without correction (red region), and another showing panels (b) and (c), where correction is necessary.
- For Figure 2, consider adding instrument bounding boxes to both the current and next frames to improve visibility. Also, it may be more accurate to avoid using the term “ground truth” for the future frame on the left side—perhaps reserve that label solely for the trajectory vector.
- Although space is limited, including a small visualization of the network architecture would greatly enhance clarity and readability.
- From a practical standpoint, experienced surgeons tend to perform smoother operations. It would be interesting to analyze whether operator seniority affects model performance. If such metadata is available, stratifying the dataset based on surgeon experience could offer valuable insights—for future work.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Please refer to the detailed comments
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have adequately addressed my concerns in the rebuttal. Provided that they incorporate the proposed changes and include the clarifications outlined in their response, I believe the paper would be suitable for acceptance.

Review #3

Please describe the contribution of the paper

This paper introduces a pioneering application of conformal prediction techniques to forecast surgical instrument trajectories during endoscopic procedures. The authors present the first study applying conformal prediction to surgical guidance, creating principled prediction intervals with formal coverage guarantees in this safety-critical domain. By separately analyzing uncertainty in both the angle and magnitude of predicted motion vectors, they ensure ground truth falls within prediction intervals with user-specified probability. The research demonstrates that independently constructed prediction intervals fail to maintain valid joint coverage as expected from hypothesis testing theory, and successfully implements multiple correction methods (Bonferroni, Sidak, and Max-Rank) to address this issue. Through systematic comparison of split conformal prediction (CP) and conformalized quantile regression (CQR) on a pituitary surgery dataset, they find that CQR generally produces more precise prediction intervals while maintaining target coverage. The work culminates in the development of uncertainty heatmaps with statistical guarantees, which could significantly enhance real-time surgical guidance by helping surgeons assess risk during procedures.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

First, the application of conformal prediction to surgical guidance represents a significant innovation. While conformal prediction has gained traction in medical imaging, this is the first study to apply these techniques specifically to surgical trajectory forecasting, addressing the critical need for reliable uncertainty quantification in safety-critical surgical applications. The novelty lies in adapting theoretical conformal prediction frameworks to the practical challenges of real-time surgical decision-making. Second, the paper presents a sophisticated methodological approach by separately analyzing uncertainty in both angle and magnitude of predicted motion vectors. This decomposition allows for more interpretable and actionable uncertainty quantification, as surgeons can better understand the reliability of predicted directions versus distances in a surgical context. The work then systematically addresses the challenge of maintaining valid joint coverage through careful application of multiple hypothesis testing corrections. Third, the comparison between split conformal prediction (CP) and conformalized quantile regression (CQR) provides valuable practical insights for implementation. The finding that CQR generally produces more precise prediction intervals while maintaining target coverage has important implications for developing systems that balance accuracy with computational efficiency in real-time surgical applications. Fourth, the development of uncertainty heatmaps with statistical guarantees represents a particularly strong contribution. These visualizations transform abstract statistical concepts into intuitive guidance tools that could be integrated into surgical navigation systems. The visualization approach effectively communicates different confidence levels (from 10% to 80%) using color gradients, potentially enhancing surgeon decision-making during procedures. Finally, the experimental validation using 144 pituitary surgery videos collected over a decade across multiple centers demonstrates robust evaluation. This extensive dataset, encompassing varied equipment and surgical techniques, strengthens the generalizability of their findings. The authors’ careful splitting of data for training, validation, calibration, and testing respects the statistical requirements of conformal prediction while addressing the practical challenges of working with real-world surgical data.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

First, the evaluation is limited to a single surgical procedure type (pituitary surgery), raising questions about how well these conformal prediction techniques would generalize across different surgical specialties with varying instrument movements and anatomical constraints. The authors acknowledge this implicitly but don’t provide cross-procedure validation. Second, the clinical feasibility demonstration is insufficient. The paper doesn’t include evaluation by surgeons to assess whether the generated uncertainty heatmaps actually improve decision-making or confidence during procedures. Without this user evaluation, the practical utility of the approach remains theoretical rather than proven. Third, the computational requirements for real-time implementation aren’t thoroughly addressed. Though the authors briefly mention that conformal prediction adds minimal computational overhead, they don’t provide benchmarks or evidence that their system can operate at the frame rates necessary for live surgical guidance. Besides, the paper doesn’t adequately address the challenge of calibration drift over time. Surgical procedures can last hours, during which instrument appearance, lighting conditions, and operative field characteristics may change substantially. The authors don’t discuss how their method would maintain valid coverage guarantees throughout lengthy procedures. Finally, the joint coverage problem, while identified and addressed through multiple testing corrections, results in substantially wider prediction intervals. The authors don’t fully explore the practical implications of these wider intervals for surgical guidance, where excessive uncertainty might render the system less useful for precision tasks.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a novel application of conformal prediction techniques to surgical instrument trajectory forecasting, making several significant contributions to the field of computer-assisted interventions. The primary strength of this work is its pioneering application of conformal prediction methods to provide statistically guaranteed uncertainty quantification in surgical guidance. While conformal prediction has gained traction in medical imaging applications, this paper represents the first application specifically to surgical trajectory forecasting—a domain where reliable uncertainty estimation is critical for patient safety. The methodological approach is particularly impressive. By decomposing trajectory predictions into angle and magnitude components, the authors create more interpretable uncertainty representations. Their systematic handling of multiple hypothesis testing through various correction methods (Bonferroni, Sidak, and Max-Rank) shows mathematical rigor and attention to statistical validity. The comparison between split conformal prediction and conformalized quantile regression provides valuable practical insights, demonstrating that CQR generally yields more precise prediction intervals while maintaining target coverage. This finding has important implications for systems that must balance accuracy with computational efficiency during real-time surgical applications. Perhaps most compelling is the translation of these theoretical concepts into practical visualization tools through uncertainty heatmaps with formal statistical guarantees. These visualizations could meaningfully enhance surgical navigation systems by providing surgeons with intuitive representations of prediction reliability during procedures. The experimental validation on a substantial dataset of 144 pituitary surgery videos collected over a decade across multiple centers further strengthens their findings. The authors demonstrate careful consideration of statistical requirements while addressing the practical challenges of working with real-world surgical data. While the paper has limitations, including the focus on a single procedure type and limited evaluation of clinical feasibility, these represent opportunities for future work rather than fatal flaws. The core contribution of bringing rigorous uncertainty quantification to surgical guidance represents an important step forward that could significantly impact surgical safety and outcomes. In summary, this paper makes original and significant contributions at the intersection of machine learning, computer vision, and surgical guidance, with clear potential for clinical translation. It deserves publication and will likely stimulate further research in this important domain.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank the reviewers for their valuable time and feedback and we address their comments and questions below.

R1 & R4: Baseline comparison.

We appreciate the reviewers’ suggestion to include additional uncertainty baselines. This would indeed be insightful. There are a few considerations we took into account that led us to exclude them from this initial analysis. Specifically, our ultimate goal of enabling surgical guidance imposes strict low-latency requirements. Popular methods for uncertainty estimation such as ensemble and dropout introduce significant latency due to the need for multiple forward passes, the former across multiple models, the latter through repeated stochastic sampling. Moreover, temperature scaling, while effective in classification, is not typically applied to regression.

In contrast, conformal methods are post-hoc and efficient, involving only simple, vectorised operations on the output of a single forward pass. CQR similarly introduces minimal overhead, which makes it well-suited for low-latency tasks requiring reliable uncertainty estimates.

R4: Novelty.

We acknowledge that the paper does not introduce a new uncertainty quantification method. At its current state, our aim was to explore the feasibility of existing methods and to adapt them for surgical guidance, where intuitive and interpretable uncertainty communication is critical. In doing so, we could not naively apply standard techniques. We worked with clinicians to formulate heatmap-based visualisations that define statistically guaranteed “safe” regions for movement. These provide spatial, easy-to-understand cues, beyond conventional numerical outputs. The manuscript prioritises these aspects. There were however multiple hurdles to tackle to reach the presented results, such as: the granularity of the forecasting task, vector parameterisation and how to make the CQR network effectively learn the challenging angle quantiles. Due to space restrictions, we did not emphasise our work tackling these challenges. We believe that, while the statistical tools are established, their careful adaptation to this context—especially with guarantees—represents a meaningful and practical contribution.

R4: Qualitative assessment.

We agree that a qualitative evaluation of the produced uncertainty heatmaps would be valuable and is an essential future step toward the safe deployment of these methods in the surgical setting. Our goal in this work was to spark interest and foster discussion within the community on how to practically use and assess these heatmaps, as this is the first study to propose such visualisations in this context. The need for this discussion is further underscored by our findings: while the proposed methods are feasible to implement in this challenging setting, the resulting prediction intervals are not especially tight—and, importantly, defining what constitutes “tight enough” remains an open question. The reviewers’ interest and suggestions for future directions are encouraging and reinforce our belief that this discussion would be valuable at a venue such as MICCAI.

R1: Heatmaps and Figures.

We thank the reviewer for the constructive input on the figures. As correctly noted, the heatmaps in Fig. 2 are created by overlaying coverage regions for different alpha values. We will clarify this in the paper. The current caption of Fig. 2 only indicates the coverage range, we will add the colour map in the final figure alongside the other suggestions.

Additionally, regarding Fig. 1(a): We see the cause of confusion and will revise the final manuscript accordingly.

We agree that visualising the architecture is beneficial, but due to space constraints, we prioritised other figures.

R1: Reproducibility.

We would like to note that the code will be released upon acceptance (p1).

R3: Future work.

We thank the reviewer for the suggestions, which provide valuable insights and point to promising directions for future work and evaluation.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

Passes the threshold following the strong rebuttal

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Conformal forecasting for surgical instrument trajectory

Author(s):