Abstract

Accurate segmentation of echocardiograph images is essential for the diagnosis of cardiovascular diseases. Recent advances in deep learning have opened a possibility for automated cardiac image segmentation. However, the data-driven echocardiography segmentation schemes suffer from domain shift problems, since the ultrasonic image characteristics are largely affected by measurement conditions determined by device and probe specification. In order to overcome this problem, we propose a domain generalization method, utilizing a generative model for data augmentation. An acoustic content and style-aware diffusion probabilistic model is proposed to synthesize echocardiography images of diverse cardiac anatomy and measurement conditions. In addition, a meta-learning-based spatial weighting scheme is introduced to prevent the network from training unreliable pixels of synthetic images, thereby achieving precise image segmentation. The proposed framework is thoroughly evaluated using both in-distribution and out-of-distribution echocardiography datasets and demonstrates outstanding performance compared to state-of-the-art methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0708_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0708_supp.pdf

Link to the Code Repository

https://github.com/Seokhwan-Oh/MLSW

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Oh_Uncertaintyaware_MICCAI2024,
        author = { Oh, Seok-Hwan and Jung, Guil and Kim, Sang-Yun and Kim, Myeong-Gee and Kim, Young-Min and Lee, Hyeon-Jik and Kwon, Hyuk-Sool and Bae, Hyeon-Min},
        title = { { Uncertainty-aware meta-weighted optimization framework for domain-generalized medical image segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors introduce a content- and style-aware diffusion model for synthesizing echocardiographic images, coupled with a meta-learning strategy for improved image segmentation using both synthetic and real data. The synthetic images exhibit realistic qualities and achieve a commendable FID score, outperforming related works. However, the segmentation outcomes do not match current state-of-the-art standards, and the model’s applicability across different phases of the cardiac cycle remains uncertain.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Interesting and relevant approach for generating synthetic data, with versatile control of content and style.
    • Promising results for synthetic data, with significant overlap with other datasets in a t-SNE analysis and strong FID score.
    • Comprehensive explanation of the meta-learning spatial weighting optimization used in segmentation.
    • Thoroughly structured experimental results, including model and dataset ablations, and evaluations on both in-distribution and out-of-distribution datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Fig. 1 is too small for the amount of detail it presents, with text becoming unreadable even when zoomed in over 250%.
    • The method for estimating Nakagami parameters from various datasets is not intuitive. Furthermore, the claim of proving that the proposed Nakagami parameter is a proper method for representing a wide range of US-style features needs more elaboration.
    • The temporal aspect of ultrasound imaging, particularly how generated images correspond to different phases of the cardiac cycle, is not addressed. It appears that all synthetic images represent the heart with a closed mitral valve, suggesting a lack of diversity in cardiac phases portrayed (e.g. only ED/ES and systolic frames).
    • Section 3 suffers from unclear notation, where ‘theta’ is ambiguously described as a network, a loss function, and a weight optimization scheme, complicating comprehension.
    • Segmentation performance, as potentially indicated by MIOU values in Table 4, falls short of existing benchmarks without a satisfactory comparative analysis or justification for why this approach might still be valuable. The original CAMUS paper presents better results, and methods such based on the nnUnet framework surpass this with a significant margin.
    • The significantly weaker results on left ventricular wall and left atrium segmentation compared to in-distribution data warrant more discussion.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • There is a discrepancy in the reported number of echo images in the CAMUS dataset. The dataset actually comprises 2000 echo images, accounting for both end-diastole and end-systole phases across 500 patients in both four- and two-chamber views.
    • I propose to change the “Region of Interest” name to ultrasound sector (or similar) to avoid confusion. “Region of Interest” is often used for specific structures of the heart for measurements in echocardiography.
    • Difficult to see the details of Fig. 4. Consider merging the merging the CAMUS and private dataset to on output image with segmentation classes in different colours to reduce the number of images and increase the effective resolution.
    • In Table 1, the authors should name what the values stand for.
    • In the introduction, data augmentation is misrepresented as a recent proposal to enhance generalizability of neural networks. I would suggest rephrasing this, as data augmentation is a well-established method in machine learning to mitigate overfitting.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite the relevant aspects of the proposed model and the realistic appearance of synthetic images, the paper’s shortcomings, particularly the segmentation performance and lack of clarity in methodological exposition, are significant. The model’s applicability across the cardiac cycle is also a critical gap, given the importance of temporal dynamics in cardiac imaging. These issues suggest that further refinement is necessary before the work can meet the publication standards of a MICCAI conference paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The authors will address most of the comments in the revised version of the paper, but the concern regarding the temporal aspect is not commented.



Review #2

  • Please describe the contribution of the paper

    The authors present a domain-generalizing echo segmentation model that during training incorporates both diffusion-generated echo augmentations and a meta-learning spatial weighting that downweights their contributions in areas of doubt. The authors show that using their diffusion generated augmentation and meta-learning scheme significantly improve the segmentation accuracy both in and out of distribution.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -MLSW with EDM data appears to provide a clear advantage in training the Seg-net tested. -Ablation study and extensive comparisons with other augmentation methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -Inference is not clearly described. Is it the output of Theta_seg without Meta-net? -Unclear how Table 1 leads to the MIOU stats quoted. -Why are the CAMUS data and their anisotropic pixels not standardized? -Why does Theta’_seg per eq 3 take both an image and Theta_meta, and other times (Algo 1, eq 4) Theta_seg takes just an image? Should that extra parameter be W_HM?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It could be that the paper length restriction is a limiting factor in this reviewer’s confidence in reproducibility

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    -“generative model helps improving the versatility” - improve -Fig 1.d.1 does not include axis labels. -R^2 of Nakagami distribution is never explained. -In Figure 2, why do the generated US images go from dark to bright; there seems to be no discernible difference in style content parameters going left to right. Further in the same direction, Y_EDM is admitted to be imprecise, none are shown (how imprecise). -“with three public Echo images.” - with the three public Echo datasets. -Unclear why DSU singled out in the comparison study. -Fig 3 description lacking any detail. -The reported FID between generated and real echo are vastly improved over prior work, though no explanation is provided. Also other augmentation data seem to lead to better in-disto results without the MLSW. Also FID is assumed known and not cited. -How many more parameters does the MLSW add to the overall training?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The segmentation model training is not clearly described, nor why it might lead to better results. However, the diffusion generative component is very interesting and the reported results represent significant improvement.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    moving from weak acc to acc due to rebuttal, promising figure changes, additional clarity, and code



Review #3

  • Please describe the contribution of the paper

    The authors address the issue of domain shift in deep-learning-based segmentation of cardiac structures from 2D echocardiography images. The domain shift is caused by the variety of used ultrasound devices as well as parameter settings on the transmit and receive side, which have a significant impact on the properties of the image data. The contribution is two-fold: A diffusion model to synthesize realistic echocardiography images as well as a meta-learning strategy that addresses the limitations of noisy labels caused by the synthesized training data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Two contributions (image synthesis and meta-learning) that are combined to improve the segmentation of echocardiography images, in particular, of out-of-distribution images
    • Detailed experimental evaluation including ablation study and comparison with several state-of-the-art methods
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The novelty of the contribution and the contribution itself are unclear from the presentation of the authors
    • A state-of-the-art base-line model such as nnU-Net is missing in the experimental evaluation
    • Claims are sometimes exaggerated
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Overall, the paper is well written, the proposed method is interesting, and the experimental results including ablation study and comparison are promising. However, the paper has a few limitations.

    My main concerns are the two short paragraphs on the contribution of the paper (end of Sect. 1). It is not clear what exactly is the novelty and what is the closest related work. For example, the authors state that they “proposeD a spatial uncertainty-aware meta-optimization method for generalized medical image segmentation” (sounds like previous work) and that they now “propose to extend the meta-learning-based noisy-label training techniques into the spatial domain”, but how spatial domain differs from previous work is not explained. Regarding the proposed diffusion model, the authors do not state what the extensions are to the cited work in [4].

    A bit of background on the echo data would be helpful for readers not familiar with the modality. For example, the term “B-mode” is used in Fig. 1-c but never explained. I like the idea of using partitions of same travel time to compute the parameters of the Nakagami distribution (is this a novel idea?). Strictly speaking, the sketched travel time is only exact for a point source and not for typical linear US transducers, and also depends on the transmit scheme as well as on the used compounding scheme.

    I think the term “universal” in “we propose a universal Echo generative model” is not justified since the EDM only considers ROI, labels, and Nakagami parameters of cardiac US images (it is not universal in the sense that it does not cover, for example, different applications or common ultrasound artifacts such as reverberations, side- and grating lobes).

    The term “defective sub-pixels” might be misleading since only “uncertain pixels” are considered.

    Abstract and conclusion claim “outstanding performance in comparison to the state-of-the-art DG methods”, however, the mean IOU is only improved by 0.022 and no statistical test (e.g., paired t-test) is provided that shows that the difference is significant. There is also no comparison of training time, amount of used training data, and inference time for the different approaches. As such, “outstanding” is not justified.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Major factor on the positive side is the proposed EDM and its use in conjunction with meta-learning as well as the experimental validation.

    Major factor on the negative side is that the contribution and novelty are not well described by the authors.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the reviewers for their thoughtful comments. The primary concerns are about the description of the contribution and discussion of the experiments. We agree. We address the comments below and will incorporate all feedback in the final version.

[R4] The contribution of MLSW is unclear

The prior research on noisy label training [15] employs meta-learning to discern correct from incorrect data. The contribution of the MLSW primarily lies in its extension to the spatial domain, allowing for pixel-level assessment of the fidelity. Unlike previous applications focused on tasks like image classification, our adaptation significantly enhances performance in semantic segmentation tasks, as evidenced in our experiments.

[R1] During the inference process, is Meta-net employed?

Meta-net is exclusively utilized during training process, and not during inference.

[R1,4] How much computational cost does MLSW require?

The computational cost of MLSW is minimal, requiring only 31MB of additional parameters during training. Notably, this computation is needed only during training, with no extra requirements during inference.

[R3,4] Limited comparison with reported accuracy of literatures including nnUnet (Employing CAMUS test dataset)

We recognize the significance of comparative analysis with reported literatures and have revisited our evaluation, this time employing the dice coefficient (Dice) metric. Our NN exhibits Dice scores of 0.94/0.86/0.91 (LV Blood pool/LV wall/LA) in the CAMUS test set.

In contrast, conventional UNet and the SoTA pretraining algorithm SimCLR report 0.91 and 0.92 Dice for the LV Blood pool, respectively (reported in [C1]). Additionally, the widely adopted segmentation scheme, MedSAM [C2], yields scores of 0.87/0.82/0.90 when fine-tuned with the CAMUS dataset. Ling et al [C3]. reported that nnUNet achieves a Dice of 0.94 for the LV blood pool when using the CAMUS dataset for both training and testing. However, it is important to note that this high performance is due to the model being specifically tuned to the CAMUS features, making it susceptible to domain shift problems. This vulnerability is evident in the nnUNet’s performance on the Echonet and HMC-QU test sets, where the Dice drop to 0.77 and 0.61, respectively. In contrast, our NN with MLSW demonstrates high generalizability, achieving Dice of 0.95 and 0.94 on these datasets.

For a rigorous evaluation of nnUNet, we re-implemented nnUNet training with all three public datasets presented in the paper. nnUNet shows enhanced generalization, achieving 0.90 and 0.86 Dice on HMC-QU and Echonet, respectively. However, its performance on the CAMUS dataset is limited to 0.91 for the LV blood pool.

We argue that our NN offers comparable accuracy to SoTA models, exhibiting high domain generalization.

[R1,3,4] How does the Nakagami parameter (NAKA) affect the generated Echo image?

We have found that NAKA omega governs US scatterer dispersion, while NAKA mu is related to the intensity of the US signal (brightness of b-mode image)

[R4] Statistical significance is not investigated.

We have performed paired t-test and found that statistical significance is achieved (P-value<0.001) in both in- and out-of-distribution data.

[R4] Regarding the idea of using travel time to compute the NAKA, is the idea novel?

Yes, it is an original concept proposed by our team. Traditionally, the NAKA is obtained by analyzing local patches of the image. However, we propose that performing statistical analysis of the US signal based on travel time provides a more accurate interpretation of US features.

[R1,3,4] Concerns with the figure, notation, and description of the formula. Concerns on reproducibility.

We will certainly improve the clarity of the figure, notation, and related explanations in the final version. Furthermore, we will provide the code to facilitate the reproducibility of our findings.

C1. Saeed, MIUA, 2022 C2. Ma, Nat. Commun, 2023 C3. Ling. FIMH, 2023




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The idea of diffusion-generated augmentations with travel time and shape prior are interesting and novel. The meta-learning spatial weighting is well designed. Experiments are sufficient to demonstrate the efficacy. Despite some clarity issues, the contributions are valuable, and the rebuttal is successful. R3’s concern for the temporal part beyond ED/ES to me is not a issue, as the evaluation is already sufficient enough for the technical contribution.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The idea of diffusion-generated augmentations with travel time and shape prior are interesting and novel. The meta-learning spatial weighting is well designed. Experiments are sufficient to demonstrate the efficacy. Despite some clarity issues, the contributions are valuable, and the rebuttal is successful. R3’s concern for the temporal part beyond ED/ES to me is not a issue, as the evaluation is already sufficient enough for the technical contribution.



back to top