Abstract

The fundamental problem with ultrasound-guided diagnosis is that the acquired images are often 2-D cross-sections of a 3-D anatomy, potentially missing important anatomical details. This limitation leads to challenges in ultrasound echocardiography, such as poor visualization of heart valves or foreshortening of ventricles. Clinicians must interpret these images with inherent uncertainty, a nuance absent in machine learning’s one-hot labels. We propose Re-Training for Uncertainty (RT4U), a data-centric method to introduce uncertainty to weakly informative inputs in the training set. This simple approach can be incorporated to existing state-of-the-art aortic stenosis classification methods to further improve their accuracy. When combined with conformal prediction techniques, RT4U can yield adaptively sized prediction sets which are guaranteed to contain the ground truth class to a high accuracy. We validate the effectiveness of RT4U on three diverse datasets: a public (TMED-2) and a private AS dataset, along with a CIFAR-10-derived toy dataset. Results show improvement on all the datasets. Our source code is publicly available at: https://github.com/an-michaelg/RT4U

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2346_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2346_supp.pdf

Link to the Code Repository

https://github.com/an-michaelg/RT4U

Link to the Dataset(s)

https://tmed.cs.tufts.edu/tmed_v2.html https://www.cs.toronto.edu/~kriz/cifar.html

BibTex

@InProceedings{Gu_Reliable_MICCAI2024,
        author = { Gu, Ang Nan and Tsang, Michael and Vaseli, Hooman and Tsang, Teresa and Abolmaesumi, Purang},
        title = { { Reliable Multi-View Learning with Conformal Prediction for Aortic Stenosis Classification in Echocardiography } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces the Re-Training for Uncertainty (RT4U) method, which aims to improve aortic stenosis (AS) classification in echocardiography. By incorporating uncertainty into weakly informative inputs in the training set and using conformal prediction techniques, RT4U generates adaptively sized prediction sets that contain the ground truth class with high accuracy. The effectiveness of RT4U is validated on three diverse datasets, showing improvement across all of them. Additionally, the paper discusses the application of deep learning in AS classification and provides background information on the topic along with relevant research work.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    ​The main strengths of the paper lie in the introduction of the Re-Training for Uncertainty (RT4U) method, which aims to improve the classification of aortic stenosis (AS), a common heart valve disease, using echocardiography. The paper’s novelty is demonstrated through the innovative approach of incorporating uncertainty into weakly informative inputs in the training set, particularly focusing on addressing the inherent limitations in echocardiographic diagnosis of AS. By leveraging conformal prediction techniques, the paper presents an original way to generate adaptively sized prediction sets, leading to high-accuracy inclusion of the ground truth class3 . The application of the RT4U method is clinically feasible and shows promise for enhancing the accuracy of AS classification, an area critical for early diagnosis and effective treatment of this life-threatening condition.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The article lacks a comprehensive description of the overall network architecture; the description of the main innovative points is insufficient.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The article lacks a comprehensive description of the overall network architecture; the description of the main innovative points is insufficient. The article does not mention whether there is class imbalance in the dataset, nor does it provide a description of the categories related to private data. The article does not explain how data from different sources in the image and video domains are handled.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The article lacks a comprehensive description of the overall network architecture; the description of the main innovative points is insufficient.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The author’s rebuttal addressed my doubts and I agree to accept it.



Review #2

  • Please describe the contribution of the paper

    The study developed a training approach called Re-Training for Uncertainty (RT4U), which utilizes pseudo-labels to address overfitting caused by noisy inputs. RT4U is a model-independent method that doesn’t require hyperparameters and can be seamlessly integrated into existing approaches for aortic stenosis (AS) classification with minimal additional complexity. The results demonstrate that RT4U enhances both top-1 accuracy and mitigates prediction overconfidence.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Aortic stenosis represents a prevalent clinical issue. For patients eligible for severe AS treatment, a novel approach called transcatheter aortic valve replacement (TAVR) is gaining prominence as a minimally invasive option. However, accurately quantifying AS severity through ultrasound remains a significant clinical challenge due to issues with echocardiography image quality. In response, this study introduces innovative methods and evaluates them across diverse datasets and echocardiographic scanners. The proposed method is novel and shows promising results upon evaluation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1, Patient’s demographics and severity of AS should be provided, but is lacking 2, Performance of RT4U shall be given in abstract.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Intra and inter observer reproducibility shall be provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Aortic stenosis represents a prevalent clinical issue. For patients eligible for severe AS treatment, a novel approach called transcatheter aortic valve replacement (TAVR) is gaining prominence as a minimally invasive option. However, accurately quantifying AS severity through ultrasound remains a significant clinical challenge due to issues with echocardiography image quality. In response, this study introduces innovative methods and evaluates them across diverse datasets and echocardiographic scanners. The proposed method is novel and shows promising results upon evaluation.

    Improvement: 1, Patient’s demographics and severity of AS should be provided, but is lacking 2, Performance of RT4U shall be given in abstract. 3, Intra and inter observer reproducibility shall be provided.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is novel. Evaluation of the proposed method using different dataset is satisfactory!

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The author’s rebuttal addressed some of my doubts .



Review #3

  • Please describe the contribution of the paper

    Authors propose a method for conformal prediction learning to cope with noisy inputs which is applied in aortic stenosis classification in echocardiography and to a toy dataset based on CIFAR. The main contribution lies with the addition of a Retraining for Uncertainty module which is a retraining performed on the average predicted scores for each training example in the initial training.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The main strength of this paper lies in addressing the fundamental problem of noisy inputs, which is particularly relevant in medical imaging. While several approaches for noisy labels have been explored, the problem of noisy inputs has been less explored.
    • A strong validation has also been performed by the authors, with a toy dataset and two validation in aortic stenosis (image-based and video-based in two datasets and also video/study based validations).
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • While the work is quite interesting, the RT4U method is in truth quite simple and not a particularly novel advance - adapting the labels according to model predictions has been proposed in several instances (though not for this purpose and in this exact way). However, I believe the work remains quite interesting.
    • While a thorough validation with several methods is performed, only the mean of each metric is shown which is insufficient. Authors mention “100 random trials” so I assume they have additional data to be able to report mean and std and perform statistical testing which would be excellent.
    • Reporting ECE is an excellent addition to the study but I wonder why the results for the TMED-2 data are not shown. ECE for CIFARQ are already given as supplementary material so surely there is no reason not to add these results as well?
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Authors use a private dataset which is not available but the two other datasets are publicly available so I am not concerned with reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • I believe the evaluation of ECE should be stated directly on section 4.3 to “prepare” the reader for the results which will be shown. I also missed an explicit reference that ECE results for CIFAR-Q are available as supplemental material (particularly as the ECE for ANL is mentioned).
    • In eq. 2 I was left wondering if there is a divisor missing to perform an average of labels across epochs? Or should the softmax be outside the sum? Am I mistaken in interpreting this formula?
    • The final sentences of the results section discuss the ordinality of the predictions and it is mentioned that 98% of prediction sets satisfy ordinality “for every method”. Is this both for methods with and without RT4U? Please make it explicit in the manuscript to avoid confusion
    • Change “under-perform” to “underperform”
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a very interesting study, working with concepts that are often disregarded but deserve the attention of the community. While the solution is somewhat simple, I believe the conference would be more interesting with studies such as this.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Not all of my concerns have been addressed, particularly regarding the lack of an ECE plot for TMED-2. I find it odd that authors have decided not to show it, particularly when the same plot for CIFARQ is already given in the supplementary material. Nevertheless, my opinion remains that the work is very interesting and would enrich the conference.




Author Feedback

We would like to thank the reviewers for their insightful comments. Reviewers agreed that the concepts introduced in the paper are innovative and address the fundamental problem of noisy inputs, in contrast to prior art that mainly focuses on noisy labels. We will clarify this in the final version of the paper.

R1 stated that there lacks a comprehensive description of the “overall network architecture”. As R2 correctly points out, “RT4U is a model-independent method that doesn’t require hyperparameters and can be seamlessly integrated into existing approaches for aortic stenosis (AS) classification with minimal additional complexity.” We would like to emphasize that RT4U is network architecture agnostic. We have demonstrated its efficacy with three network architectures (ResNet-18, R(2+1)D, and ProtoASNet). Our only requirement is that the model produces prediction confidences over a set number of training epochs, which is the case for almost all deep learning approaches. If accepted, we will further emphasize the architecture-agnostic nature of our proposed contribution.

Regarding disease severity and demographic information of the AS Private dataset: There are 1088/575/909 studies of normal/mild/significant AS cases, respectively. For TMED-2, there are 126/171/301 studies of normal/mild/significant AS, respectively. During training, we used weighted random sampling based on the inverse of the class proportion to account for class imbalance.

A description of the categories in the private data is included in Section 4.2, but to reiterate, the dataset was initially categorized into four classes: normal/mild/moderate/severe using American College of Cardiology (ACC) guidelines. To make our labels consistent with TMED-2, we combined moderate and severe together into the “significant” category. The ACC guidelines determine severity based on Doppler measurements obtained as part of the echo exam.

Regarding the inter- and intra- observer variability of the AS severity label: the severity label depends purely on the Doppler measurements, as such, there is no inter-observer variability in the traditional sense, since Doppler measurements are quantitative. However, there is potentially significant inter-observer variability in the acquisition of Doppler ultrasound data 1. Running an independent study to verify the observations of 1 is outside the scope of our paper. 1 reports coefficients of variation as high as 28% for the aortic valve area, one of the Doppler measurements.

Regarding patient demographics: due to the data deidentification process as part of our study protocol, most of the demographic information (such as sex and race) are not available to our group. However, we downloaded data with no selection bias in terms of demographic information, therefore the patient population should follow the general patient admission at our local hospital.

R6 requested clarifications for eqn. 2 and parts of Section 4, as well as the addition of standard deviation for conformal prediction results. We will make these improvements in the final submission.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    In terms of methodological novelty, this paper does not add much to the table. However, it was submitted for the clinical application track. The authors conducted numerous experiments to demonstrate the marginal improvements of their method. Since all reviewers voted for acceptance, I will not be overly critical of the paper’s quality.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    In terms of methodological novelty, this paper does not add much to the table. However, it was submitted for the clinical application track. The authors conducted numerous experiments to demonstrate the marginal improvements of their method. Since all reviewers voted for acceptance, I will not be overly critical of the paper’s quality.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviewers do still have some reservations. However, the novelty of the method seems to have outweighed other factors in recommending this paper for acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviewers do still have some reservations. However, the novelty of the method seems to have outweighed other factors in recommending this paper for acceptance.



back to top