Abstract

Medical image classification is an important task in many different medical applications. The past years have witnessed the success of Deep Neural Networks (DNNs) in medical image classification. However, traditional softmax outputs produced by DNNs fail to estimate uncertainty in medical image predictions. Contrasting with conventional uncertainty estimation approaches, conformal prediction (CP) stands out as a model-agnostic and distribution-free methodology that constructs statistically rigorous uncertainty sets for model predictions. However, existing exact full conformal methods involve retraining the underlying DNN model for each test instance with each possible label, demanding substantial computational resources. Additionally, existing works fail to uncover the root causes of medical prediction uncertainty, making it difficult for doctors to interpret the estimated uncertainties associated with medical diagnoses. To address these challenges, in this paper, we first propose an efficient approximate full CP method, which involves tracking the gradient updates contributed by these samples during training. Subsequently, we design an interpretation method that uses these updates to identify the top-k most influential training samples that significantly impact models’ uncertainties. Extensive experiments on real-world medical image datasets are conducted to verify the effectiveness of the proposed methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1623_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1623_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Che_Modeling_MICCAI2024,
        author = { Chen, Aobo and Li, Yangyi and Qian, Wei and Morse, Kathryn and Miao, Chenglin and Huai, Mengdi},
        title = { { Modeling and Understanding Uncertainty in Medical Image Classification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15010},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors presented an optimization of a method for assessing uncertainty based on full conformal prediction technique. In addition, the authors proposed an innovative approach to estimate the k most influential samples from the training set that contributed most to the uncertainty estimation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors proposed a simple but quite effective optimization of the solution using Taylor decomposition applied to the SGD sequence. This trick allowed the authors to calculate the weights of the neural network much faster when adding and removing samples during additional training at the stage of forming a conformal set. The authors also showed that such an approximation of the full conformal prediction method allows one to achieve comparable efficiency in terms of the size of the prediction set. Moreover, the introduced optimization allowed the authors to make the problem independent of the number of parameters in the neural network.

    In addition, the authors proposed an innovative approach to explaining the causes of uncertainty. They illustrated this by identifying k samples from the training set that, if removed from the dataset, would cause the true label to disappear from the conformal set. To solve this problem, the authors developed a new optimization problem. They showed that it can be solved via the introduced optimization.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness of the proposed paper is the lack of sufficient experiments and research for the proposed uncertainty understanding problem.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors use plots depending on the running time of the proposed solution, although they provide insufficient information about what hardware setup was used for experiments.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Authors claim that “existing research fails to elucidate the origins of prediction uncertainties, a gap particularly critical in the medical domain”. However they fail to provide proof of this statement via referencing prior works. Were there any other approaches to understanding uncertainty? If there were none authors should state that explicitly.

    Authors claim that “This insight is vital for medical practitioners and researchers aiming to refine the model, whether by enriching the dataset with more diverse medical image samples, fine-tuning the model’s parameters to better capture the nuances of complex medical conditions, or employing advanced training strategies to bolster the model’s predictive accuracy and reliability”. However, this statement is controversial with the idea of k most influential training samples. It is unclear how this idea will help with claimed enchantments.

    The authors fail to provide any numerical metrics in order to evaluate how stable the proposed method can determine k most influential training samples. Several provided examples are not enough to demonstrate the work of this concept.

    The authors try to propose two separate messages in the paper regarding new optimization and understanding problems. However, the understanding part requires additional research to support the claimed messages. Uncertainty understanding is a very interesting topic and deserves its own separate work.

    All in all this paper is accurately written and can provide sufficient contribution to conformal prediction uncertainty estimation optimization as it is shown fully superior to the 2023 year baseline.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The optimization part of the article can be quite influential for the field. The uncertainty understanding part needs to be reinforced with additional discussion and experiments.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposes an approximate conformal prediction method by approximating the parameter optimization trajectory of the underlying model by its first order Taylor expansion. This approximation reduces the computational cost associated with exact full conformal prediction or higher order approximation methods. The paper also proposes to explain the model uncertainty by finding the top-k most influential training data for each test point.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed approximated method is computationally more efficient than the exact full conformal and ACP method as is shown in experiments.
    • The proposed method can be used to find samples that are influencing the prediction of the model the most. (Although the mechanism of doing so in not clearly explained as the objective in Eq.7 is hard to optimize and the proposed empirical method is not well explained).
    • The authors perform multiple experiments that empirically shows the superior efficiency of the proposed approximate full conformal method in comparison to one other approximate method, ACP.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • There can be a huge gap between the first order approximations of the parameters and the true parameters learned by full retraining of the model. No experiments show the validity of the approximations explicitly(i.e. by a histogram showing the distribution of distances between the approximated and true parameters for all deleted/added points).
    • Eq.6 is proposed to reverse the effect of a single point in Eq. 5 after ignoring all higher order terms. However, for a large number of training epochs, adding a point to the training data using the proposed first order gradient updates (\bar{W}\leftarrowW^-\frac{\eta E}{b}\del \cal{l}(X^;W^*)) can lead to over-fitting on the added sample and invalid conformal sets consequently. (None of the experiments evaluate the proposed method when trained in such scenario i.e. E»b).
    • The paper only compares the efficiency with ACP and ECP. What about other approximate methods (e.g. the ones cited in ACP paper)?
    • Efficiency is only evaluated at a single value of \epsilon. A comprehensive figure could show the prediction set size w.r.t. different values of \epsilon over an interval discretized to very small steps along with the AUC comparisons with other exact and approximate conformal prediction methods.
    • Line 6 of page 6: claims to add the most influential points to the empty set. The “Influence of a point” is not defined. The explainability procedure is not clear.
    • In experiments, the authors claim the top-k samples are in “close proximity” to the sample at test but provide no explanation of how the measure proximity.
    • The efficiency of a conformal prediction method depends on the underlying model. The paper doesn’t provide a comparison in terms of efficiency between different model architectures. In supplementary materials, authors only compare to architectures in terms of accuracy and exclude efficiency.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors provided the source code in supplementary materials.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Add a figure showing how the prediction set size changes with respect to continuous values of \epsilon at least in [0.0,0.2] instesd of only 3 values.
    • In addition of comparing the accuracy of the 2 model architectures compare the proposed method’s efficiency for multiple architectures.
    • Write the explainability part of the proposed method clearly and explain/prove how the empirical top-k samples will lie “in close proximity” of the given sample.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper may be accepted since it provides means for faster conformal prediction by approximating the exact full conformal method by first order Taylor expansion of the parameters’ training trajectory and enabling empirical explainability. I don’t strongly suggest acceptance as the proposed approximations are not shown theoretically or empirically to be close to the true values and the experiments are not fully validating the superiority of the proposed method in comparison to the related previously proposed approximate full conformal methods.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper introduces a novel method for uncertainty estimation in medical image classification using conformal prediction (CP), reducing computational demands.

    Furthermore, the paper introduces an Uncertainty Explanation method (UnEX) that identifies the most influential training samples affecting the model’s uncertainty.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper avoids the computationally intensive need to retrain deep neural networks (DNNs) for each test instance, which is a common requirement in traditional full CP methods. Instead, TAFCP uses a Taylor series expansion to approximate the impact of adding or removing specific training samples on the pre-trained model, enabling rapid uncertainty estimation.

    In addition, this paper also develops a novel Uncertainty Explanation method (UnEX) to identify the top-k most influential training samples impacting the model’s uncertainty.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the paper has numerous strengths, there are several areas where it could potentially be improved or where the limitations need to be acknowledged.

    1. The evaluation of the proposed uncertainty estimation methods appears incomplete. For a comprehensive assessment in the context of medical image classification, it would be beneficial to include metrics such as confidence calibration, uncertainty calibration, Brier score, and out-of-distribution (OOD) detection. These experiments are crucial for demonstrating the model’s capability in reliable uncertainty quantification.
    2. The selection of baseline methods is limited to conformal prediction (CP)-based approaches. To provide a more robust evaluation, it would be advantageous to include comparisons with simpler and more traditional uncertainty estimation methods such as Monte Carlo (MC) Dropout (Gal and Ghahramani, 2016) and Ensembles (Lakshminarayanan et al., 2017).
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors could consider expanding their experimental section to include additional comparisons of uncertainty estimation performance.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the experiments section seems incomplete, the novelty and mathematical foundation of this paper appear strong. Therefore, I suggest a ‘weak accept’.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

N/A




Meta-Review

Meta-review not available, early accepted paper.



back to top