Abstract

In this work, we present a novel approach to calibrate segmentation networks that considers the inherent challenges posed by different categories and object regions. In particular, we present a formulation that integrates class and region-wise constraints into the learning objective, with multiple penalty weights to account for class and region differences. Finding the optimal penalty weights manually, however, might be unfeasible, and potentially hinder the optimization process. To overcome this limitation, we propose an approach based on Class and Region-Adaptive constraints (CRaC), which allows to learn the class and region-wise penalty weights during training. CRaC is based on a general Augmented Lagrangian method, a well-established technique in constrained optimization. Experimental results on two popular segmentation benchmarks, and two well-known segmentation networks, demonstrate the superiority of CRaC compared to existing approaches. The code is available at: https://github.com/Bala93/CRac/

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1527_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1527_supp.pdf

Link to the Code Repository

https://github.com/Bala93/CRac/

Link to the Dataset(s)

https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html https://flare.grand-challenge.org/

BibTex

@InProceedings{Mur_Class_MICCAI2024,
        author = { Murugesan, Balamurali and Silva-Rodriguez, Julio and Ben Ayed, Ismail and Dolz, Jose},
        title = { { Class and Region-Adaptive Constraints for Network Calibration } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper addresses the important and often neglected issue of deep learning calibration. Previous works that address calibration mainly use post processing or entropy optimization strategies to achieve network calibration. However, these approaches do not harness the inherent relationships between pixels in medical image segmentation. Other works have applied SVLS with a fixed penalty weight to improve calibration in segmentation. The current work introduces two main methodological advantages that build off these previous approaches. First, they propose a novel penalty weighting strategy that changes depending on class and region. Second, they use an Augmented Lagrangian method (ALM) to adaptively learn the optimal value for each weight instead of presetting them. They provide an evaluation of their method by showing that it can improve the calibration on the ADRC and FLARE datasets using U-Net or nnUNet.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Their method appears to be fairly straightforward to add to any deep learning segmentation task or network. This increases the potential usefulness of their approach. They provide the code anonymously to help justify their claims.
    2. The key advantage over the previous work is that the new penalty terms are adaptive and learnable. This greatly increases the flexibility and adaptability of the approach.
    3. The paper is easy and clear to read. Their intuitive format aids in the understanding of calibration and optimization approaches.
    4. They validate their approach on two relevant medical datasets and two powerful medical segmentation backbones. It is helpful to assess their method knowing that it can apply to multiple different setups and still perform well.
    5. Tables 1-2 evaluate the results using a combination of common segmentation and calibration metrics. The results show that increasing the calibration comes with better segmentation performance as well. This is an advantage because many times better calibration can come with a slight tradeoff on the in-distribution testing set.
    6. Fig. 1 is nice when it comes to highlighting their novelty.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper is similar to reference 22: Murugesan, B., Adiga Vasudeva, S., Liu, B., Lombaert, H., Ben Ayed, I., Dolz, J.: Trust your neighbours: Penalty-based constraints for model calibration. In: MICCAI. pp. 572–581 (2023). This paper seems like a direct improvement of reference 22, as there are even similarities in the text and equations. In general, this work seems like an adaptive version of the reference 22 method. The new work does offer advantages over reference 22, but I have some concerns relating to this previous paper (see constructive comments).

    Reference 22 actually seems to do better in the FLARE dataset with nnUNET (table 2) compared to the current improved method. I think this warrants more discussion considering that the methods are very similar.

    2.The paper suggests that it offers “class and region-wise penalties”. It is currently not clear as to how their two regions of “inner patch” and “outer patch” truly impact the model. The paper suggests that a pixel on occurs “inner” or “outer”, so in theory the regional component should have already been implicitly included. More clarity could help with this.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?
    1. They provide their code in the abstract on an anonymous github link. It looks like a normal github repository, but it cannot be traced back to the owner. The code is legible and has instructions for use.
    2. They provide the parameters that they use for their comparison algorithms so that the experiments can be replicated.
    3. The datasets that are used in this paper are available for public download from their respective challenges – ACDC (https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html) and FLARE (https://flare.grand-challenge.org/).
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. As I noted in the weaknesses, I am confused as to the impact of the region-wise component of the penalty term. I would be interested in a comparison of the “class and region-wise penalty term” from the new paper compared to the “class only term” from ref 22, without the ALM learning component. This would help validate if both contributions from the current paper truly improve the method.
    2. Reference 22 seems like it is better in the FLARE dataset, especially in nnUNet’s results in Table 2 and particularly for calibration. Would the authors show the Friedman score and Final score as calculated separately by dataset?
    3. They call the reference 22’s method “NACL”. I don’t see anything defines “NACL” as an acronym for anything. I assume that it should stand for “neighbor aware calibration loss”? I read reference 22 to help me review this, but I couldn’t find the acronym defined.
    4. Figure 1 is a helpful table in showing the advantages of the current method compared to reference 22. I noticed it seems to match Table 1 not Table 2. Could the authors either clarify this in the Figure 1 caption? Is there an existing Figure matching nnUNet (table 2)? I am okay without the extra figure if space if an issue.
    5. I am interested as to why the authors choose to process the 3D datasets as 2D slices. Do the authors have plans to implement this in 3D in the future? (I am not asking for this in the current work.)
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is an interesting topic that warrants discussion at MICCAI. I ranked it as weak accept because of concerns related to the novelty compared to reference 22. It could use some further clarifications during the review period as space allows.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presented an approach to calibrate segmentation models by starting from the inherent difficulties of different classes and regions. Experiments demonstrate the effeteness of the proposed methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper start from the specificities of each category and different regions. The overall idea is clear and well-represented.
    2. This paper provides an elegant solution to chose the penalty weights via Augmented Lagrangian Multiplier.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The lk in Equation 2 needs explanation, as it is important to help the reader understand the unequal importance of each category and different regions.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The lk in Equation 2 needs explanation.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The overall idea is reasonable and easy to follow.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    Submission 1527 proposed a novel methodology for calibrating segmentation models and demonstrates superior performance in comparison to the current state of the art. The relevance of calibration, or conversely the dangers of miscalibration in safety-critical scenarios is highlighted clearly. The proposed method integrates independent class and region-wise penalty weights (previous methods employed uniform weights). An iterative Augmented Lagrangian approach is employed to adaptively learn these weights. The comprehensive Experiments are carried out on two datasets and in comparison, to a wide range of benchmarks and demonstrate superior performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • the presentation is very clear, including the relevance of the work and the contributions (as well as related work)..
    • the methodology is novel and the results surpass the existing state of the art.
    • the selection of very current state-of-the-art benchmarks and datasets is suitable.
    • the analysis of the Experiments is convincing and thorough, hyperparameters are documented clearly.
    • a more detailed comparison with the most competitive of the SotA methods (NACL) is provided.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Merely very minor presentation questions raised.
    • The superior performance is somewhat incremental and in some cases the SotA has better performance.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The Conclusion could be its own section (instead of a subsection within the Experiments section) for further clarity. The first sentence of the Conclusion is also broader than (beyond the scope of) the Experiments section.

    Very minor comments:

    • The authors may want to consider using the U-Net spelling from the original paper (U-Net vs UNet).
    • Is there enough space in the results tables for arrows? I see they are currently in the captions and assume this might be due to space limitations (which is fine).
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The relevance of the topic and the contributions are clearly outlined in the paper. The comprehensive and thorough Experiments support the claims convincingly. The selection of SotA methods is suitable, as are the datasets employed.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

N/A




Meta-Review

Meta-review not available, early accepted paper.



back to top