Abstract

Image segmentation relies heavily on neural networks which are known to be overconfident, especially when making predictions on out-of-distribution (OOD) images. This is a common scenario in the medical domain due to variations in equipment, acquisition sites, or image corruptions. This work addresses the challenge of OOD detection by proposing Laplacian Segmentation Networks (LSN): methods which jointly model epistemic (model) and aleatoric (data) uncertainty for OOD detection. In doing so, we propose the first Laplace approximation of the weight posterior that scales to large neural networks with skip connections that have high-dimensional outputs. We demonstrate on three datasets that the LSN-modeled parameter distributions, in combination with suitable uncertainty measures, gives superior OOD detection.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1339_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1339_supp.pdf

Link to the Code Repository

https://github.com/kilianzepf/laplacian_segmentation

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zep_Laplacian_MICCAI2024,
        author = { Zepf, Kilian and Wanna, Selma and Miani, Marco and Moore, Juston and Frellsen, Jes and Hauberg, Søren and Warburg, Frederik and Feragen, Aasa},
        title = { { Laplacian Segmentation Networks Improve Epistemic Uncertainty Quantification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a deep segmentation network whose parameter posterior distribution is approximated using the Laplace model.

    The estimated posterior distribution allows the authors to analyze the epistemic uncertainty of the model and use if for Out of Distribution (OoD) detection.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    OoD detection is a very important topic which is here addressed using a nice approximation of the posterior of the model parameters. Some interesting experiments are carried out using the recently proposed ValUES framework.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The modelling of the observation process is taken from

    [25]. Monteiro, M., et al.: Stochastic segmentation networks: Modelling spatially correlated aleatoric uncertainty. Advances in Neural Information Processing Systems 33, 12756–12767 (2020).

    Then the Laplace approximation is already somehow present in

    [24]. Miani, M., Warburg, F., Moreno-Muñoz, P., Detlefsen, N.S., Hauberg, S.: Laplacian autoencoders for learning stochastic representations. In: Advances in Neural Information Processing Systems (2022).

    The model in the MICCAI submission already appeared in arxiv in March 2023. see *** (Masked by the program chair to maintain anonymity).

    In the 2023 arxiv paper the model is much more clearly explained and the experimental validation is very similar. The only difference is the OoD figures of merits (variance based in arxiv, entropy based in the MICCAI submission)

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Together with the comments already made in the revision. I believe the submission would benefit from a joint OoD-Active Learning analysis. To me, some of the criteria used are more AL than OoD ones. Furthermore, I think, the distribution of the features should also play a role in OoD detection.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I believe that very similar results have already been around for more than a year.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors propose a Laplacian Segmentation Network (LSN) to address the challenge of out-of-distribution detection, and develop a fast Hessian approximation that scales linearly with pixel number. The experiments and results show that the proposed method performs better out-of-distribution detection than ensemble and dropout approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The authors apply the Laplacian approximation to the image segmentation task, and address challenges in out-of-distribution detection.

    2. The authors develop a fast Hessian approximation algorithm, which scales linearly with pixel number and is important for efficient computation.

    3. The authors compare the proposed method with ensemble and dropout approach and show that the proposed method achieves better out-of-distribution detection.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Many details in the method should be fixed.

    2. Result visualization can be improved.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors propose a Laplacian Segmentation Network (LSN) to address the challenge of out-of-distribution detection, and develop a fast Hessian approximation that scales linearly with pixel number. The experiments and results show that the proposed method performs better out-of-distribution detection than ensemble and dropout approach. Overall, the manuscript is well written. However, many details need to be fixed for the manuscript to be more readable.

    Major comments:

    1. In Eq. 2, the notations for Shannon entropy and mutual information are not standard. Shannon entropy should be a function of a random variable, not a function of a probability density function. Mutual information should be a function of two random variables, not a function of a joint probability density function.
    2. In Eq. 4, is y binary? The should be made more clear.
    3. In Eq. 9, the format such as sum should be fixed. What is s in this equation?
    4. In Eq. 11, \theta is a vector of dimension 2T-t (Eq. 8), and \theta_{MAP} is a vector of dimension T (Eq. 10). How can the mean of \theta be \theta_{MAP} when their dimension does not match?
    5. In Fig. 1, \theta_1, \theta_2, and \theta_3 seem to contradict the previous definition or not defined.
    6. In Fig. 1, what does “D, P deterministic as part of \theta_{MAP}” mean? If they are fixed, why do the authors sample all \theta values? This needs more clarifications.
    7. In Eq. 11, the definition for H* do not make sense because log p(\theta* D) is not a function of \theta, and it Hessian w.r.t \theta is 0. Do the authors mean log p(\theta D)?
    8. In Eq. 12, what does [ ]l mean? Isn’t it more clear to write \nabla{\theta_l}?
    9. In Eq. 12, can the authors write out explicitly what H^(L) is? What is the difference between L and l?
    10. In Fig. 2 left, is it more clear to separate mutual information, EPKL, pixel variance into different subplots, and in each subplot compare different approaches such as Laplace, ensemble and dropout?
    11. For uncertainty normalization, how to choose the respective in-distribution test data?
    12. In Table 2, why do the authors choose 5% and 10% as threshold? How can this be robust? Isn’t it better to show directly show the relative difference and see which one is higher?
    13. In supplementary, when calculating the Jacobian J_{\theta}, is the order of the composition of maps f_{\theta} wrong? Moreover, J_{\theta} is not properly defined. What is the difference between J_{\theta} and \nabla_{\theta}?
    14. In supplementary, when calculating Hessian of the loss, there are two terms. Why is the second term omitted in Eq. 12 in the main manuscript?

    Minor comments:

    1. In introduction, Laplace approximations should be abbreviated as LA the second time it occurs.
    2. There are notation inconsistencies between the main manuscript and the supplementary, such as the diagonal operator D, and the Hessian H etc.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The manuscript is generally well written, but many details need to be fixed for the manuscript to be more readable.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have addressed most comments from reviewers.



Review #3

  • Please describe the contribution of the paper

    The authors investigate out-of-distribution detection in medical segmentation tasks. They propose an approach combining the aleatoric logit distribution of Stochastic Segmentation Networks (SSN) with Laplace approximations for epistemic uncertainty quantification and demonstrate that it allows better OOD detection than some commonly used uncertainty estimation methods. The achieved advances in OOD detection are demonstrated on three datasets with medical images. The authors also propose a Hessian approximation that scales linearly instead of quadratically in both model parameters and output resolution. In addition, they show that epistemic uncertainty is better estimated by Pixel Variance and Expected Pairwise KL than by Mutual information.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors propose an original approach to scale Laplace approximations to segmentation tasks based on the use of a trace-preserving diagonal Hessian approximation for models with skip connections. Their approach scales linearly with the number of image pixels instead of existing methods with quadratic complexity that makes the method applicable to medical images of high resolution.
    • The proposed method also allows to estimate the aleatoric uncertainty as a logit distribution of Stochastic Segmentation Networks (SSN) and the epistemic uncertainty as Laplace approximations jointly. Separating these uncertainties is not a novel idea;jointly the aleatoric uncertainty as logit distribution of Stochastic Segmentation Networks (SSN) and epistemic one as Laplace approximations. Separating these uncertainties is not a novel idea, however, the approach itself seems to be new.
    • They also make a comparison of some commonly used ways to estimate epistemic uncertainty such as Pixel Variance, Expected Pairwise KL, and Mutual information.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • I would recommend considering more uncertainty estimation methods (at least in the “Background” Section, ideally — in the experiments) such as Monte Carlo Markov Chain (MCMC), test augmentation based approaches, etc. Currently the proposed approach is compared only with MC Dropout and Ensemble methods, however, there are other state-of-the-art uncertainty estimation methods.
    • I would recommend being more specific with results. For example, in the sentence: “Our results add insight to the recent discussion which calls into question the suitability of common uncertainty measures” I would add more details about this insight. Furthermore, in the abstract there is a phrase “gives superior OOD detection”, but in my opinion, it is better to provide some quantitative support and mention which methods are compared with.
    • It is a good idea to mention the methods limitations. The experiments are conducted only with U-Net, what about other architectures? The paper describes uncertainty estimation only in the mean function of a mean-variance network, is it planned to extend it to the entire models?
    • In Section 4.1 it is worth explaining how the ID/OOD classification is conducted. Is there any threshold based on estimated uncertainty? If so, how is it chosen?
    • I would also recommend adding some absolute values supporting the achieved results, for example, the number of times the error rate decreases for OOD classification. In addition, the authors provide relative values for uncertainties such as 5% and 10%, however, the significance of their results is not evident when we do not have absolute values.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The structure is mostly logical and the language is generally clear. Nevertheless, the correlation between the title and the abstract is unclear (see more details after item 8). Furthermore, the results obtained mostly support the abstract rather than the title, so I would recommend renaming the paper. There are also some inaccuracies in formulation. The authors write: “…Current Laplace approximations [8] scale quadratically with the output dimension of the neural network, which prevent their usage in segmentation. We develop…” It seems from the text that authors propose a novel solution for this problem. However, as it is mentioned in the preprint, techniques to apply Laplace approximations for large models despite their computational complexity exist: Botev, 2020, and frameworks for them are available: Detlefsen et al., 2021; Daxberger et al., 2021. Moreover, there is a lack of logical connection between the cited sentence and two following ones where the authors state that their method allows to estimate both epistemic and aleatoric uncertainty (it is unclear form this part of the text why they mention quadratic complexity).
    It is also worth noting that the “Discussion and Conclusion” Section does not provide any discussion, it just contains the results enumeration. I would also recommend adding a frontal illustration at the beginning of the paper that visually explains the main contribution and the achieved results. I would also consider adding more LSN work illustrations.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    On the one hand, uncertainty estimation is an up-to-date topic and the proposed approach is a novel way to jointly estimate aleatoric and epistemic uncertainty. Furthermore, the authors propose an approach reducing computational complexity in medical image segmentation. In addition, they make a comparison of three commonly used ways to estimate epistemic uncertainty.

    Nevertheless, it is unclear why the title emphasises improvement in epistemic uncertainty estimation while the paper itself is devoted to the advances in OOD detection provided by the method, using jointly aleatoric and epistemic uncertainties. Moreover, the title does not reflect that the significant part of the paper is devoted to optimization. There are some other issues that can be improved (listed under items above), however, if the given recommendations are taken into account, I would recommend accepting the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Thank you for your time and feedback, which we will incorporate in the final version.

= Preprints = Preprint papers, including arxiv, should not be considered by reviewers to assess the novelty of submissions, and we refrain from comment on this point raised by R1/4. We ask the AC to take this into account.

= Reproducibility = We will release our codebase upon publication, which contains many of the requested details that did not fit in the paper, e.g. numbers for Fig. 2-3 and Table 3.

= Contribution to Laplace approximation (LA) (R1, R4) = Indeed, the Hessian approximation we use exists in a form that scales to larger images, but did not apply to models with skip connections. We solve this by presenting a suitable block diagonal structure for the Jacobian.

= R1 = Indeed, Active Learning (AL), next to out-of-distribution detection (OOD), is an important application of epistemic uncertainty. However, AL requires more from the uncertainty estimates than OOD: To gather informative new samples, you don’t just want high epistemic uncertainty, you also want low aleatoric uncertainty. AL is thus a natural second, but not first, validation step.

= R3 = 1) Notation for entropy and mutual information from [32] were used to save space and avoid additional notation. The formulations are equivalent. 2) y is binary. 3) S is the number of pixels.

4-7) Please note that “mean” in Eq. 10 refers to the mean network: At this step we discard the variance head of the mean-variance network, going from t+2(T-t) to t+1(T-t)=T parameters, indicating thetas in this space with a ‘’. Eq. 11 operates in this space and the samples of the distribution can be seen as mean networks, see different gray shades in Fig. 1. Sampling T parameter mean networks requires taking the remaining parameters of the variance head from the MAP estimate vector (deterministic). The definition of H below Eq. 11 should be with \theta* for the nabla operators.

8) In Eq. 12, []l and \nabla{\theta_l} are indeed the same. 9) In Eq. 12, H^(L) denotes the Hessian of the loss function w.r.t. the neural network output. Explicitly, given an output v=f_\theta(x) we have H^(L)=diag_i(\frac{e^{v_i}}{1+e^{v_i}} - (\frac{e^{v_i}}{1+e^{v_i}})^2). 11) The ID testset used for normalization is given by the data split.

12) The 5% and 10% thresholds provide intuition on how effective each method is at assigning epistemic uncertainty, via its relative sensitivity to OOD data. In future work, statistical tests could provide more meaningful thresholds. 13) Functions f are in wrong order; there is no difference between J_{\theta} and \nabla_{\theta}.

All requested details will be incorporated!

= R4 = Epistemic uncertainty vs OOD: Since epistemic uncertainty should be high for OOD samples, following [15], we utilize OOD as a downstream task for validating epistemic uncertainty. Here, the role of aleatoric uncertainty, captured by the SSN, is to improve the estimated epistemic uncertainty. We will clarify this.

Baselines: a) The LA depends on the aleatoric component only via the loss function. Thus, an alternative aleatoric component via test time augmentation wouldn’t change the results. b) Implementing the Hessian approximation for the SSN loss function of the mean-variance network is a promising next step, but we emphasize that our simpler version is faster as it circumvents the sampling to evaluate the SSN loss. c) MCMC on the mean network could be an alternative epistemic component. While this is interesting, it would be a novel method, not an existing baseline.

ID/OOD classification: Following [15], AUROCs were calculated with sklearn using per-image ground truth binary labels (0-ID, 1-OoD) versus uncertainty scores as target predictions.

We also note that our conclusions are unlikely to change much with architecture, if the uncertainty methods and measures are identically applied. We will include a discussion of these limitations and possibilities in the final version.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Two reviewers have provided positive comments on this paper. The arXiv manuscript mentioned by Reviewer 1 should not be considered a valid reason for rejection. Therefore, I recommend accepting this paper.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Two reviewers have provided positive comments on this paper. The arXiv manuscript mentioned by Reviewer 1 should not be considered a valid reason for rejection. Therefore, I recommend accepting this paper.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper has sufficient merit and majority of reviewers’ comments have been satisfactorily addressed.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper has sufficient merit and majority of reviewers’ comments have been satisfactorily addressed.



back to top