Abstract

Ejection Fraction (EF) regression faces a critical challenge due to severe data imbalance since samples in the normal EF range significantly outnumber those in the abnormal range. This imbalance results in a bias in existing EF regression methods towards the normal population, undermining health equity. Furthermore, current imbalanced regression methods struggle with the head-tail performance trade-off, leading to increased prediction errors for the normal population. In this paper, we introduce EchoMEN, a multi-expert model designed to improve EF regression with balanced performance. EchoMEN adopts a two-stage decoupled training strategy. The first stage proposes a Label-Distance Weighted Supervised Contrastive Loss to enhance representation learning. This loss considers the label relationship among negative sample pairs, which encourages samples further apart in label space to be further apart in feature space. The second stage trains multiple regression experts independently with variably re-weighted settings, focusing on different parts of the target region. Their predictions are then combined using a weighted method to learn an unbiased ensemble regressor. Extensive experiments on the EchoNet-Dynamic dataset demonstrate that EchoMEN outperforms state-of-the-art algorithms and achieves well-balanced performance throughout all heart failure categories. Code: https://github.com/laisong-22004009/EchoMEN.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1328_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1328_supp.pdf

Link to the Code Repository

https://github.com/laisong-22004009/EchoMEN

Link to the Dataset(s)

https://echonet.github.io/dynamic/index.html#dataset

BibTex

@InProceedings{Lai_EchoMEN_MICCAI2024,
        author = { Lai, Song and Zhao, Mingyang and Zhao, Zhe and Chang, Shi and Yuan, Xiaohua and Liu, Hongbin and Zhang, Qingfu and Meng, Gaofeng},
        title = { { EchoMEN: Combating Data Imbalance in Ejection Fraction Regression via Multi-Expert Network } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose to tackle the data imbalance regarding the ejection fraction values, in a dataset of echocardiography images. For this, they propose a supervised contrastive loss.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of this paper is the attempt to tackle data imbalance, by giving more relevance/weight to the data points present in smaller quantity in the training dataset.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses of this paper are the poor SOTA regarding the ejection fraction estimation, and also the novelty related to the this task, i.e. the clinical translation of the improvement the method brings is not novel (EF estimation).

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) The authors present their work on developing an ejection fraction regression method. However they never mention what it means in the clinical practice, nor why you measure it or how. In chapter 1, the SOTA is really poor and should be further improved, including some of the more recent methods to estimate it. Since this tasks has been addressed so many times over the years, I think that this method doesn’t bring enough novelty to the field. 2) In figure 1, the graph c) seems to show a result of the research in a figure included in the introductory section. Please separate these graphs. 3) In chapter 1, describe what DIR is. The acronym is not explained, neither the method is briefly explained. Why do you compare your method with this one? 4) In chapter 1, when the authors write “Consequently, it fails to effectively (…)”, this paragraph should be in the discussion chapter and not in the introduction. 5) There’s a typo in the last word of figure 2 caption. 6) When the authors mention their contributions, the reader should be aware of what the current SOTA is when it comes to predict EF. Is there actually a strong need to address this topic? 7) Remove contribution #4 (all the other 3 contributions lead to this result). This is a conclusion, not a contribution. 8) In chapter 2, the dimensions of the images’ space are TxWxHxC. I assume C are the channels and therefore your data is 2D + time but make it clear for the reader. 9) In chapter 2.1 the paragraph starting as “Specifically, let I denote the samples indices (…)” should be rephrased, as it is confusing the way it is. 10) On the same chapter 2.1, 2nd paragraph, the sentence where the losses are introduced is also confusing because it’s too long and needs rephrasing. Maybe something like “Our proposed (…) constrastive loss [6] Lsupcon, pushing all negative samples equally, while Lldw-supcon considers the inherent continuity underlying (…)”. 11) In chapter 2.2, where the authors write “However, experimental results show (…)”, there should be references to these experimental results. 12) Either in equation 3 (y^i) or the sentence starting right after it (y^m), there’s a typo. Fix it? 13) What’s the weighting criteria used by the experts? Is it data-driven? 14) In chapter 3.1, provide the reference to the EchoNet paper, where they present the dataset. 15) In tables 1 and 2, add the measurements units. 16) The authors describe table 1 as showing a quantitative comparison on the EchoNet data. A comparison of what? 17) The differences shown in table 1 are so low, that the authors should perform further statistical analysis to evaluate its significance. Also, such evaluation is needed to support the claims made in chapter 3.3. 18) If what you model actually predicts is the ejection fraction, how can the mean values be so low? 19) In chapter 3.3, the paragraph starting with “Moreover, we present (…)” should specify what are the predicted values referring to. 20) More statistical results/analysis are necessary to support the novelty/strength of this method (a simple regression line fitting is not enough). Maybe some correlation? 21) What is the actual accuracy of the model on the test set? How well does it perform in real clinical practice?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper doesn’t bring enough novelty to the field. Also, the results are not explored in a sufficient manner to show/support the authors’ claims.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I still struggle to find the novelty in this work, but considering other reviewers comment and the not so much expertise working with ejection fraction, the manuscript could now be accepted.



Review #2

  • Please describe the contribution of the paper

    This work proposes a method to overcome data imbalance in ejection fraction (EF) distribution using two stages, where the first stage proposes a weighted supervised Contrastive Loss to improve learned representations and the second stage trains multiple regression experts to focus on different parts of the target region. These predictions are then combined with an ensemble regressor. The method is evalaued on the benchmark EchoNet-Dynamic dataset and compared to two baseline approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Clearly motivated problem with good related work.
    • Well written
    • I like the idea of combining two stages to overcome the data imbalance problem.
    • The method is compared to many baselines and methods.
    • Ablation study for the effect of different stages.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Generalizability concern of the proposed work to other datasets.
    • See questions below.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors use the publicly available EchoNet dataset and will make their code publicly available. This should make the paper easily reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • There also some other works with contrastive loss to solve EF prediction problem. For example [1]. Please consider adding some references in CL direction as well.
    • Figure 2: please also explain (a)-(d) in the figure caption. In Sec. 2.1: what are the positive and negative samples for each anchor? How do you extract them? Are they the samples from the same category/EF or echos from same video?
    • Also regarding the previous point: there are different ways to extract the positive and negative samples for the contrastive loss. Did the authors experiment with different options? Such as samples from same category can be positive pairs and different category can be negative pairs. Also another option would be samples from same patient echo can be positive pairs and all the other patient echos can be negative pairs etc.
    • What is the effect of the regression loss in stage 1? Can one also neglect it and only take the L_{LDW-SupCon}?
    • It is not certain whether the proposed model can generalize effectively to other datasets. For example, Chen et al. [2] utilized a 3D UNet with 19 million parameters [3] to predict EF, trained on the CAMUS dataset [4], and demonstrated strong generalization on the EchoNet test fold. Thus, it would be beneficial for the authors to conduct additional analysis to assess the generalization performance of their model on different datasets.
    • Can you also report RMSE and R2 as error metrics? This was standard for the original EchoNet paper as well.

    [1] Ozkan et al. “M(otion)-Mode Based Prediction of Ejection Fraction Using Echocardiograms”, DAGM GCPR 2023, https://doi.org/10.1007/978-3-031-54605-1_20, 2024 [2] Chen, Yida, et al. “Assessing the generalizability of temporally coherent echocardiography video segmentation.” Medical Imaging 2021: Image Processing. Vol. 11596. SPIE, 2021.
[3]Çiçek, Özgün, et al. “3D U-Net: learning dense volumetric segmentation from sparse annotation.” Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19. Springer International Publishing, 2016. [4] Leclerc, Sarah, et al. “Deep learning for segmentation using an open large-scale dataset in 2D echocardiography.” IEEE transactions on medical imaging 38.9 (2019): 2198-2210.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall it is a well motivated and well written paper and I have some questions and clarifications needed which can improve the paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I appreciate the authors’ comments in their rebuttal. EF prediction is a well-studied task in the medical imaging field for both supervised and contrastive learning. The low performance difference is not a significant issue for me. I find the idea and methodology of the paper to be sound.

    However, evaluating the generalizability of the method is crucial, as also demonstrated by the original EchoNet paper with external test set. I still believe that the paper would benefit from this additional experiment and will help readers to understand the method better.



Review #3

  • Please describe the contribution of the paper

    The authors propose an EF regression scheme that is more balanced in its error profile across EF classifications compared to baseline and another imbalanced-data regressor. The principal contributions are a contrastive loss forcing class separation in a representation learner, paired with an ensemble or aggregator model that separately learns to regress on the different classes in training (re-weighted regression loss) and uses the most appropriate output at test time.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -Extensive experimentation on a large open dataset showing SOTA results compared to several other models. Includes ablation study as well. -Clearly written, well justified

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -Limited background narrative on echo regression

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    -“HFrEF but increase the error of HFpEF” - increases -“boost representation learning in imbalanced dataset,” - data -“We introduces the concept of ensemble learning” - introduce

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    SOTA results on an important medical imaging problem, clear and well justified model, extensive experimentation including comparison to prior work and ablation study.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Strong Accept — must be accepted due to excellence (6)

  • [Post rebuttal] Please justify your decision

    Given all reviewer comments and author rebuttal, this reviewer still believes this is a strong paper in the direction of improving error profiles in auto-estimating EF. Could justify inclusion in the health equity direction as well, understood expansively.




Author Feedback

We thank all reviewers for their valuable comments and positive feedback, such as the thorough experiments (R5&R6), and promising results (R6).

Q. Poor SOTA Comparison (R3) We compare EchoMEN with leading EF regression methods, including EchoGNN [11] (MICCAI 2022) and EchoCoTr [12] (MICCAI 2022), which are recognized as SOTA on the EchoNet-Dynamic dataset according to latest 2024 review [21]. We also extend SOTA imbalanced regression algorithms, such as RankSim [2] (ICML 2022), to EF regression, demonstrating that RankSim outperforms most existing EF regression methods in the challenging HFrEF and HFmrEF categories. (Table 1) [21] Sanjeevi, G., et al. “Deep learning supported echocardiogram analysis: A comprehensive review.” Artificial Intelligence in Medicine (2024): 102866.

Q. Low Performance Difference (R3) Considering the scale of the EchoNet-Dynamic dataset (10,030 videos, over 1.5 million clips in the beat-to-beat pipeline), the absolute improvements achieved by EchoMEN are substantial. EchoMEN outperforms the baseline and all competing methods in overall MAE, with a 2.96% relative improvement compared to 2.22% for the best existing method EchoCoTr [12]. On the overall GM metric, which better reflects algorithmic fairness, EchoMEN reduces prediction error by 3%, while the best imbalanced regression method RankSim [2] only achieves a 0.07% reduction. Furthermore, existing EF regression methods underperform the baseline on GM, indicating insufficient fairness. By following the experimental setup in [11-13], we ensure a rigorous comparison and demonstrate the superiority of EchoMEN. (Table 1)

Q. Novelty and Significance (R3) As stated in the Introduction and shown in Fig.1 (c), data imbalance in EF regression undermines health equity. Unlike previous works that focus on overall accuracy [11-13], EchoMEN enhances performance across abnormal EF ranges without compromising the accuracy for majority of normal samples, ensuring fairness for all populations.

Q. Low Mean Values (R3) MAE and GM are both error metrics [2,20], where lower values indicate better performance. Detailed formulas are available in Supplementary Materials.

Q. Positive Sample Selection (R5) EchoMEN defines positive samples as clips from the same category (P(i) and Q(i) in Section 2.2). It covers both choices you mentioned (inter-patient and intra-patient similarities). For samples in a batch, we apply data augmentation, which is common in CL, generating positive samples from the same patient. Additionally, if a batch contains clips from different patients but with the same EF value, they will also be considered as positive samples. We appreciate your suggestion and plan to further investigate the impact of selection strategies in future work.

Q. Additional Datasets (R3&R5) The EchoNet-Dynamic dataset (10030 videos) is substantially larger than CAMUS (500 videos) and has been the sole benchmark for recent EF regression methods [5,11,12,15] (MICCAI). Therefore, we focus on EchoNet-Dynamic for comparative experiments and plan to conduct experiments on CAMUS in future work.

Q: Additional Evaluation Metrics (R3&R5) As our focus is tackling data imbalance, we adopt MAE and GM which are the most used metrics in imbalanced regression literature [2,20]. These two metrics provide a comprehensive evaluation of our method’s effectiveness in improving both accuracy and fairness. Moreover, we visualize the RMSE distribution in Fig.1 (c) to illustrate the superiority of EchoMEN.

Q: Effect of Regression Loss (R5) While using only L_LDW-SupCon in the first stage is possible, we found that incorporating the regression loss accelerates convergence. The regression loss provides strong guidance for learning meaningful features, while L_LDW-SupCon encourages feature separability. Combining both losses enables more effective representation learning.

Q: Language (R3&R6) We will carefully revise any typos and notation errors to enhance readability.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper is well-written, the methodology clearly explained and the experimental results, including baselines and ablation studies are convincing. I would recommend taking care of the feedback of the reviewers (especially R3) and incorporating the small suggested changes to the camera-ready version to enhance the clarity of the paper and fix any typographical errors.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper is well-written, the methodology clearly explained and the experimental results, including baselines and ablation studies are convincing. I would recommend taking care of the feedback of the reviewers (especially R3) and incorporating the small suggested changes to the camera-ready version to enhance the clarity of the paper and fix any typographical errors.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top