Abstract

Uncertainty estimation has been widely studied in medical image segmentation as a tool to provide reliability, particularly in deep learning approaches. However, previous methods generally lack effective supervision in uncertainty estimation, leading to low interpretability and robustness of the predictions. In this work, we propose a self-supervised approach to guide the learning of uncertainty. Specifically, we introduce three principles about the relationships between the uncertainty and the image gradients around boundaries and noise. Based on these principles, two uncertainty supervision losses are designed. These losses enhance the alignment between model predictions and human interpretation. Accordingly, we introduce novel quantitative metrics for evaluating the interpretability and robustness of uncertainty. Experimental results demonstrate that compared to state-of-the-art approaches, the proposed method can achieve competitive segmentation performance and superior results in out-of-distribution (OOD) scenarios while significantly improving the interpretability and robustness of uncertainty estimation. Code is available via https://github.com/suiannaius/SURE.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3770_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/suiannaius/SURE

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiYuz_UncertaintySupervised_MICCAI2025,
        author = { Li, Yuzhu and Sui, An and Wu, Fuping and Zhuang, Xiahai},
        title = { { Uncertainty-Supervised Interpretable and Robust Evidential Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15973},
        month = {September},
        page = {660 -- 670}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This work presents an uncertainty supervision learning framework for image segmentation. It introduces three losses to the evidential deep learning approach to increase uncertainty interpretability and robustness against noise. The approach is evaluated in cardiac MRI and fundus photography segmentation tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Three loss functions are introduced to the evidential deep learning framework with the goal of increasing uncertainty interpretability and robustness to noise. The losses are motivated by assumptions about the relationship between image gradients and uncertainty and between noise and uncertainty at the object boundary.

    The uncertainty supervision learning framework is tested against five alternative methods on two disparate image segmentation tasks: cardiac MRI and fundus photography.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Although the proposed method performs well respect to alternative methods, there is not an obvious top performer across most of the performance metrics, particularly for the cardiac segmentation task (Table 1). Qualitatively, it is not obvious from Fig. 3 why the uncertainty maps of the proposed method are superior to those of other methods like Devis.

    The newly proposed evaluation metrics (uncertainty correlation coefficient and uncertainty ratio) are formulated to capture the same relationships as the proposed loss terms. So it is not surprising that the proposed method would perform well with respect to these metrics.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While this was an interesting adaptation of the evidential deep learning framework, it is not very clear that the proposed loss terms dramatically improve upon previous methods.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    While the uncertainty supervision learning framework is interesting and broadly relevant to medical image segmentation, I still feel that the comparison with existing uncertainty methods (Table 1, Figure 3) lacks strength. It seems that the proposed evaluation metrics are somewhat reverse-engineered forms of the proposed principles and loss functions.



Review #2

  • Please describe the contribution of the paper

    This paper introduces three principles based on gradients and noise within images, for the qualtification of uncertainty. Two new metrics are introduced to illustrate the applicability of these principles.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel formulation of uncertainty in terms of gradient and noise
    • Evaluation against various other uncertainty estimation methods
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Principles assume known presence of boundary
    • Evaluation metrics appear reverse-engineered from the proposed Principle definitions, without direct comparison with human intuition
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. In Section 2.2, the gradient in the provided equation is computed on a Gaussian smoothed image for stability. However, this condition may not be true with real-life images. It might be clarified as to the appropriateness of this assumption.

    2. In Section 2.3, Principles 2 and 3 require knowledge of the boundary (d_0) in Equation (3). However, from Principle 1, boundaries may not be clear. It might be clarified as to how these conflicting definitions/assumptions are resolved.

    3. Principle 3 appears not to hold towards the limits where noise is maximized (i.e. the image becomes random noise). This might be clarified.

    4. In Section 3.2, it is claimed that conventional metrics such as ECE and UEO cannot be used for quantitative evaluation of uncertainty, and require new metrics (UCC and UR). In particular, the UCC and UR formulations (Equations 5 and 6) appear to incorporate the metrics (image gradients and distance from boundary) as defined for Equations 2 and 3, i.e. are tailored towards the formulated Principles. More detailed justification should be provided.

    5. In Figure 3, the yellow background for UDrop uncertainty-gradient contrast might be explained.

    6. In Figure 3, the definition of “entirely noisy image” should be clarified - what was the magnitude of noise added? If possible, an example of such an image should be provided.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The recommendation reflects appreciation of the novel uncertainty formulation, and was made to allow authors to address the raised concerns.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Authors have largely addressed our previous concerns.



Review #3

  • Please describe the contribution of the paper

    Proposes a self-supervised for learning the uncertainty, by introducing relationship b/w uncertainty and the image gradients around boundaries and noise. The authors formulate 2 loss functions based on this relationship that enhances the interpretability of the model similar to a human’s reasoning pattern. The paper also proposes two novel metrics for evaluating the robustness and interpretability of the models predictions.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper builds on a sound intuition about how humans interpret certain errors and is theoretically motivated. The paper proposes 2 losses for learning the uncertainity:

    1. a gradient-based loss to explain for the high uncertainity near boundaries. The loss is grounded on the sound intuition that the sharp boundaries with higher image gradients lead to low uncertainty and fuzzier boundaries with low gradients lead to higher uncertainty.
    2. a noise-based supervision loss, under the assumption that the uncertainty would be higher near the boundary in presence of large noise, and vice versa. Whereas if the pixel is far away from the boundary, the uncertainty would be negligible irrespective of the noise. Thus they propose to pay more attention to hard pixels in this setup i.e., the locations for which the uncertainty after adding noise is less than the uncertainty without noise. The paper is easy to follow and well written.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The proposed method performs well on the proposed metrics however on previous metrics such as UEO, the scores lag behind DEviS and PU on REFUGE (0.275 vs. 0.359/0.384). The authors should provide a discussion on this. Moreover, even on one of the proposed metric (UR), the proposed method seems to perform poorly on ACDC dataset compared to UDrop and PU (0.585 vs 0.818/0.801), the authors should provide more discussion on this, since the proposed metric performs inconsistently across the various reported metrics.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I like how the authors build on sound intuition and theory for an important problem of uncertainity estimation and explainability in medicla imaging. Although their method is inconsistent, the method is simple and sound, and i think the community should know about this.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    the rebuttal answers the concerns I had




Author Feedback

We sincerely appreciate all the reviewers’ insightful comments and are committed to improving the manuscript accordingly. Below, we provide detailed responses to each concern. ●R1&R2: Motivations behind Human-Aligned Metrics (UCC/UR) and Loss Functions The motivation of our principles is that aligning uncertainty estimation with human cognitive logic can jointly enhance model interpretability and robustness. To quantify adherence to three human-inspired intuition patterns: 1) gradient-uncertainty inversion, 2) boundary-proximity noise sensitivity, 3) stable non-boundary responses, we propose UCC/UR metrics that assess relative relationships rather than absolute magnitudes. By incorporating these principles through straightforward yet effective loss functions, we achieve both noise-robust performance (Fig.2) and interpretable uncertainty prediction (Fig.3). While conceptually related to training losses, these metrics crucially measure how well models internalize the proposed principles. ●R1&R2: Explanation of Uncertainty Map (Fig.3) We appreciate the suggestion to clarify Fig.3, which showcases how our method produces interpretable uncertainty aligned with human intuition (detailed per row): Row 2 verifies Principle 1 via gradient-uncertainty overlap. Our method exhibits lower uncertainty in stronger gradient (e.g. clear upper-right Myo-BG junction), and vice versa (e.g. ambiguous mid-inferior Myo-BG junction). Competitors show gradient insensitivity (UDrop’s yellow background indicates its suboptimal uncertainty estimation; Devis’ irregular correlations). Row 3 examines patch-based noise response (Principle 2&3) via uncertainty difference map. Our method selectively activates uncertainty for near-boundary noise (e.g. patches near myocardium outer boundary), while maintaining stability to noise far from boundary (e.g. central LV area). Others either under-detect (w/o sup, PU, UDrop, EU) or overactivate (Devis, TTA) in such areas. Row 4 tests uncertainty difference after adding Gaussian noise (μ=0.5, σ=0.3) to the entire image. Our method preserves boundary sensitivity (Principle 2) and non-boundary stability (Principle 3), contrasting with: i) BG false positives (w/o sup, top-left corner); ii) Central LV overactivation (Devis, EU, TTA); iii) Non-responsiveness (PU, UDrop). In revision, we will annotate key comparisons with arrows and zoomed regions to highlight our advantages, alongside explanations above. ●R2&R3: Discussion of Performance Consistency (Table 1) While prioritizing uncertainty interpretability and robustness over segmentation performance, our method achieves top DSC across datasets. The UEO metric evaluates how well uncertainty correlates with prediction errors, but we argue that human-like uncertainty should arise from the image content, not from access to ground-truth labels. Thus, while our method may sacrifice performance on UEO, it better captures the interpretable aspects of uncertainty. Notably, our method shows consistent UCC/UR scores across datasets, while others exhibit at least one incorrect UCC sign (marked with “X” in Table 1). For instance, though UDrop/PU outperform us in UR, they violate UCC sign requirement (+0.016/+0.162 vs. required -) on REFUGE or ACDC. ●R1: Clarification of Implementation Details 1.Reasons for Gaussian smoothing: In gradient calculation, it mimics human visual processing by suppressing artifacts while maintaining smooth activation patterns (Thielscher & Neumann, texture boundary detection and visual search, 2005). 2.Potential conflict across Principles: To avoid this conflict, our adaptive thresholding (d_0) defines error-tolerant boundary regions covering pixels within the threshold as candidates (Principle 2&3). For ambiguous boundary (Principle 1), the GT-boundary usually resides within the neighborhood. This ensures alignment among principles. 3.Clarification for Principle 3: We apply bounded noise during training to avoid images degradating into random noise.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This work propose a novel method for uncertainty estimation/supervision based on some human-intuition-inspired principles. The method is novel and distinctive from existing uncertainty estimation methods. The results generally validated the effectiveness of the proposed method.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have clearly clarified the issues raised by the reviewers. The explanations in the rebuttal look reasonable and correct to me.



back to top