Abstract

Image segmentation is a challenging task influenced by multiple sources of uncertainty, such as the data labeling process or the sampling of training data. In this paper we focus on binary segmentation and address these challenges using conformal prediction, a family of model- and data-agnostic methods for uncertainty quantification that provide finite-sample theoretical guarantees and applicable to any pretrained predictor. Our approach involves computing nonconformity scores, a type of prediction residual, on held-out calibration data not used during training. We use dilation, one of the fundamental operations in mathematical morphology, to construct a margin added to the borders of predicted segmentation masks. At inference, the predicted set formed by the mask and its margin contains the ground-truth mask with high probability, at a confidence level specified by the user. The size of the margin serves as an indicator of predictive uncertainty for a given model and dataset. We work in a regime of minimal information as we do not require any feedback from the predictor: only the predicted masks are needed for computing the prediction sets. Hence, our method is applicable to any segmentation model, including those based on deep learning; we evaluate our approach on several medical imaging applications. Our code is available at https://github.com/deel-ai-papers/consema.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3902_paper.pdf

SharedIt Link: https://rdcu.be/eHaVq

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04965-0_8

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/deel-ai-papers/consema

Link to the Dataset(s)

N/A

BibTex

@InProceedings{MosLuc_Conformal_MICCAI2025,
        author = { Mossina, Luca AND Friedrich, Corentin},
        title = { { Conformal Prediction for Image Segmentation Using Morphological Prediction Sets } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15963},
        month = {September},
        page = {78 -- 88}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes to apply conformal prediction for binary image segmentation, by exploiting directly the ground truth and predicted segmentation masks, without the need for intermediate pseudo-confidence maps. The authors propose to conformalise the number of dilation iterations needed to guarantee with high probability the desired coverage. In this way the size of the margin will be representative of the predictive uncertainty.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The method proposes a new way to apply CP to binary segmentation masks. Using it, one only needs access to the predicted mask, at test time, without requiring access to sigmoid-like maps.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- What is the number of samples used for calibration? More information about the dataset splits could be useful to have an idea of the setup.
- Fig. 2 shows qualitative results from the state of the art methods VS the proposed one. It would have been beneficial to report more quantitative ones.
- Why do you think that sub-figure (e) in Fig. 2 reports such a behaviour? Did the authors see that this happens consistently? This is also connected to my previous point.
- Similarly, I think it would be beneficial to report some more qualitative results of the proposed method, especially to be able to visualise the influence of the stretch on segmentation.
- Typos: “we randomly shuffles” -> “shuffled”, “It is important that one rely”-> “relies”.
- For clarity, it could be stressed a bit more that what the authors are trying to conformalize is the number of dilation iterations, e.g. when you list the steps of the conformalization algorithm.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper proposes a new way of applying conformal prediction to the task of image segmentation. I believe that some of the already performed experiments should be reported more extensively in terms of both quantitatively and qualitatively results, to better understand what is the real advantage of the proposed solution over state of the art methods.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper proposes to use conformal prediction to extend a segmentation produced by any black box model by a margin such that the margin contains the true segmentation with certain guarantees. A key contribution is to employ nested prediction sets that are obtained using successive dilation of the original mask. The paper is evaluated using three different base models, each with a different dataset. It is shown that coverage behaves as expected and that the proposed approach outperforms using nested prediction sets obtained by thresholding the sigmoid output.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is very well written, with precise mathematical notation and is easy to follow.
- The idea of nested prediction sets constructed by consecutive dilations is simple and effective.
- The method is comprehensively evaluated with multiple base models and datasets, and is shown to substantially improve the “tightness” of the margins over the more conventional approach of bulding segmentation prediction sets by thresholding the model confidence.
- Minor, but using latex boxes to visualise connectivites and segmentation labels is a nice touch that improved readablity.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The contribution of the paper with respect to prior work is not clearly stated. While related work is covered it is not discussed how the present paper actually relates to it. The contribution paragraph only contains a high-level view of the contributions. The paper could be improved by stating clearly how the present work differs from prior work and what contributions are truly novel. (To my understanding this is only the dilation component. Please correct me if I am wrong).
- It is mentioned in the caption of Tab. 2 that the baseline method is equivalent to reference [4]. For clarity, it would be nice if the baseline was more clearly described and attributed in the main text.
- It is unclear to me from the text if [4] is the strongest baseline that could have been chosen. Are there innovations in references [27] or [13] that could have produced a stronger baseline for the current experiments?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents a simple extension of existing techniques for producing segmentation margins using conformal prediction. The contribution is comprehensively evaluated and is shown to produce tighter margins compared to an established baseline. Overall, I believe this is a strong paper that will be of interest to the MICCAI community.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The submission proposes a conformal risk control method to guarantee a binary segmentation model covers the unknown ground-truth mask at a pre-specified level. Differently from existing methods, the submission proposed to use morphological dilations that do not require access to the logits of the predictor. Extensive experiments support the validity and behavior of the proposed method in comparison with existing alternatives.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is clear and well-written
- Calibration of segmentation models is an important active field of research
- Using morphological tools that do not require access to the underlying model is a neat idea
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Novelty is limited to the use of morphological tools
- Existing alternatives can control non-monotone risks such as the F1 score
Overall, the paper is well-written and the contribution valuable to the community. I have a few questions and I am looking forward to discussing with the authors!
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- Algorithm How is Eq. (3) implemented in practice? I assume you start with a dilation with a large value of lambda and backtrack at each iteration until risk is controlled. It may be helpful to clarify for unfamiliar readers since the procedure is introduced with iterative applications of the same dilation, but this is not implemented in practice?
- Non-monotone risks CRC methods for segmentation models usually control the F1 score. Controlling coverage is a reasonable task. I am curious whether the authors could expand on whether they have considered how to use their method for non-monotone risks?
- Comparison with thresholding method Does the thresholding method used control coverage or F1 score? I was somewhat surprised to see Fig 2.e, as the conformalized predictor seems to predict the negative of what one would expect. Is this a cherry-picked example? Do the segmentation logits actually increase going further away from the polyp?
- Localization As mentioned by the authors, I would be curious to hear more about their ideas on adaptively choosing lambda depending on the input, similarly to [9].
Minor comments:
- Typo in “shuffles and partitioned”
- Typo in “in this a sense”
- Tables: it might increase readability to separate models and datasets with horizontal lines, and vertically align model and dataset names
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Aside from the clarification on comparison with thresholding method, the submission is well-written and contribution valuable to the community.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We would like to thank the reviewers for their time and their feedback.

In light of your questions and recommendations, we will do our best to modify the text to be clearer, within the limit of half a page, as imposed by the conference.

The main contribution is that any morphological operator ensuring a non-decreasing margin (wrt lambda) can be used. This operator must be chosen by the user, for instance: custom morphological kernels (e.g. tall or wide kernels if the predictor is biased), chains of operators (e.g a square dilation + a cross dilation), and other morphology tools (e.g. skeletons, geodesic reconstructions, etc.).

We agree on the general observations that more results would have helped. However, due to the limited number of pages and the absence of supplemental materials, we privileged simplicity of exposition with the iterative dilation by a cross-shaped 3x3 kernel. We also limited the results to 3 different datasets and models.

All our experiments are reproducible with the code provided, which allows to test other morphological configurations and datasets or predictors. The reader will find additional results in the notebooks complementing the figures of the main paper.

Rev 1

“related/prior”: an important contribution is that our method is applicable to predictors that only provide segmentation maps (e.g. no need for sigmoid scores). Mathematical morphology is the tool we chose to build the uncertainty margins, together with the nested sets property. From your remark, we will make it more explicit

“strongest …”: it is not possible, a priori, to definitively determine a best baseline method; as for thresholding, it depends entirely on the quality of the sigmoid scores, which could be systematically bad, as shown in Tab. 2.

As for [27], their example of using a discretized euclidean distance is a special case of ours, equivalent to applying a dilation with a disk of radius lambda.

As for [13], they do thresholding on the softmax of a multiclass case, hence it is not directly applicable.

Rev. 2

Points 1/5/6: thanks for the remarks, we will add (1), rectify (5) and clarify (6) the paper.

Points 2/4: we agree but due to page limits, we chose to provide the online compendium giving an easy way to test other methods/configurations and generate results.

Point 3: Yes, the behaviour in Fig. 2 is systematic. We included this use case because it has been extensively used in several highly influential papers in conformal prediction and related methods [4,6,9,13] where they usually use alpha=0.1. Surprisingly, lowering alpha reveals that the underlying sigmoid scores behave counterintuitively. In Fig. 2 of [13], due to the loss used to train PraNet, the scores are highly skewed to be very low around the object. As a result, the conformalized mask does not “extend” Y_Hat, but adds pixels from outside the actual mask region.

Rev. 3

Thank you for your additional comments: some of these observations are being considered for future papers.

wrt to CRC [4], our prediction set formulation (w/ nested sets [18]) is directly applicable to that approach, hence one can easily control a variety of losses.

“Algorithm”: We implemented a version where we iterate 1, then 1 + 1, then 1+1+1 dilations etc. until stopping condition. We are aware that faster implementations exist (e.g dichotomic search) but in our case this was not a bottleneck.

“non-monotone”: see above

“comparison with”: for thresholding we use the same criteria as morphology, so we “extend” the predicted mask according to the underlying sigmoid scores until the criterion in Eq. (3) is satisfied. The same functions for the metrics are applied, the only difference is the shape of the conformal masks.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

All three reviewers liked the paper and were willing to accept the paper. It would be great if, even if early acceptance is given, the authors could still clarify some of the questions asked by R2 and R3.

back to top

Conformal Prediction for Image Segmentation Using Morphological Prediction Sets

Author(s):