Abstract

Electrocardiogram (ECG) is a widely used tool for assessing cardiac function due to its low cost and accessibility. Emergent research shows that ECGs can help make predictions on key outcomes traditionally derived from more complex modalities such as echocardiograms (ECHO), enabling the use of ECGs as a more accessible method to predict broader measurements of cardiac function. ECHO, in particular, are of great importance because they require considerable hospital resources while playing a key role in clinical cardiac assessment. To aid this use case, we introduce EchoingECG, a probabilistic student-teacher model that leverages uncertainty-aware ECG embeddings and ECHO supervision to improve ECG-based cardiac function prediction. Our approach integrates Probabilistic Cross-Modal Embeddings (PCME++), a probabilistic contrastive framework, with ECHO-CLIP, a vision-language pre-trained model trained on ECHO-text pairs, to distill ECHO knowledge into ECG representations. Through experiments and external validation, we showed that EchoingECG outperforms state-of-the-art foundation ECG models in zero-shot, few-shot, and fine-tune settings for ECHO predictions based on ECG. We also highlighted that variance estimation (enabled through our method) enhanced our understanding of model performance by identifying underlying regions of uncertainty within ECGs. The code is available: https://github.com/mcintoshML/EchoingECG.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0267_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/mcintoshML/EchoingECG

Link to the Dataset(s)

MIMIC-ECG: https://physionet.org/content/mimic-iv-ecg/1.0/ MIMIC-ECHO: https://physionet.org/content/mimic-iv-echo/0.1/ MIMIC: https://physionet.org/content/mimiciv/3.1/ MUSIC ECG: https://physionet.org/content/music-sudden-cardiac-death/1.0.1/

BibTex

@InProceedings{GaoYua_EchoingECG_MICCAI2025,
        author = { Gao, Yuan and Kim, Sangwook and McIntosh, Chris},
        title = { { EchoingECG: An Electrocardiogram Cross-Modal Model for Echocardiogram Tasks } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {175 -- 185}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces EchoingECG, a probabilistic multimodal framework that models uncertainty in ECG signal embeddings by leveraging the PCME++ framework. This method addresses the inherent ambiguity and many-to-many nature of ECG-ECHO/text relationships better than deterministic methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduces EchoingECG, a probabilistic multimodal framework that models uncertainty in ECG signal embeddings by leveraging the PCME++ framework. This method addresses the inherent ambiguity and many-to-many nature of ECG-ECHO/text relationships better than deterministic methods.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. While the paper emphasizes probabilistic learning and knowledge distillation, it relies on standard architectures (1D-ResNet for ECG, BioBERT for text) and a frozen teacher (ECHO-CLIP). There is no novel model architecture proposed, which limits the depth of innovation from a modeling perspective.
    2. ECHO videos are treated as independent frame-level embeddings, ignoring inter-frame temporal dependencies. This simplification overlooks the dynamic nature of ECHOs, which might reduce the fidelity of ECHO supervision.
    3. The framework relies entirely on ECHO-CLIP as a frozen teacher model trained on ECHO-text pairs. However, its generalizability across clinical settings, devices, and populations is not evaluated. If the teacher model carries biases or misrepresents certain cardiac states, these biases may propagate into the student ECG model, compromising downstream performance.
    4. ECHO videos are treated as independent frame-wise embeddings without modeling temporal dependencies. Similarly, ECGs are segmented into 10-second windows, but the temporal continuity and rhythm dynamics are not fully captured.
    5. ECG-ECHO pairing is based on hadm_id and admittime, but temporal alignment between modalities is not explicitly verified. It’s unclear whether both modalities reflect the same clinical state.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (1) Strong Reject — must be rejected due to major flaws

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. While the paper emphasizes probabilistic learning and knowledge distillation, it relies on standard architectures (1D-ResNet for ECG, BioBERT for text) and a frozen teacher (ECHO-CLIP). There is no novel model architecture proposed, which limits the depth of innovation from a modeling perspective.
    2. ECHO videos are treated as independent frame-level embeddings, ignoring inter-frame temporal dependencies. This simplification overlooks the dynamic nature of ECHOs, which might reduce the fidelity of ECHO supervision.
    3. The framework relies entirely on ECHO-CLIP as a frozen teacher model trained on ECHO-text pairs. However, its generalizability across clinical settings, devices, and populations is not evaluated. If the teacher model carries biases or misrepresents certain cardiac states, these biases may propagate into the student ECG model, compromising downstream performance.
    4. ECHO videos are treated as independent frame-wise embeddings without modeling temporal dependencies. Similarly, ECGs are segmented into 10-second windows, but the temporal continuity and rhythm dynamics are not fully captured.
    5. ECG-ECHO pairing is based on hadm_id and admittime, but temporal alignment between modalities is not explicitly verified. It’s unclear whether both modalities reflect the same clinical state.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    The author did not answer all my questions.



Review #2

  • Please describe the contribution of the paper

    This paper introduces EchoingECG, a probabilistic student-teacher model that leverages electrocardiogram (ECG) data to predict outcomes typically derived from echocardiograms (ECHO). The main contributions are twofold:

    1. The application of Probabilistic Cross-Modal Embeddings (PCME++) for cross-modal ECG-text embedding learning and ECHO-ECG knowledge distillation. This probabilistic modeling enables uncertainty quantification for downstream tasks.
    2. The implementation of zero-shot and few-shot learning using ECG embeddings to predict ECHO-derived information, demonstrating promising results compared with competing methods.
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Practical clinical relevance: The work addresses a genuine clinical need by using more accessible ECG data to predict outcomes typically derived from resource-intensive echocardiograms.
    2. Probabilistic approach: The authors employ a probabilistic framework for modeling ECG features and their cross-modal relationships with text and ECHO data, enabling the estimation of embedding uncertainty (via σ² values), which provides valuable clinical insights and improves performance on subsets with lower uncertainty.
    3. Strong experimental validation: The model demonstrates superior performance across multiple datasets (MIMIC and MUSIC) in zero-shot, few-shot, and fine-tuning scenarios.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Model training clarity: After pretraining on two datasets, it’s unclear whether the model is frozen for few-shot learning and fine-tuning, or if only the last classification layer is fine-tuned.
    2. Hyperparameter justification: The choice of λ=0.9 in the loss function lacks proper justification or ablation studies.
    3. Text modality rationale: The necessity of using text as a bridge rather than directly linking ECG-ECHO is not fully explained.
    4. Problematic ECHO frame independence assumption: The authors treat ECHO frames as independent, but this approach is physiologically questionable. Lower LVEF videos may demonstrate smaller variance between frames due to reduced myocardial motion, while higher LVEF may show larger variance, making the direct use of frame features as distribution parameters potentially questionable.
    5. Missing baseline comparisons: The work lacks comparisons with simpler approaches that directly use ECG features to predict ECHO-related outcomes, failing to justify the need for such a complex model.
    6. Unexplained performance discrepancy: The authors don’t adequately discuss why few-shot learning demonstrates reduced results compared to zero-shot learning in Table. 3.
    7. Loss function ambiguity: While the probabilistic distance metric is mentioned, the specific loss function used for training is not clearly stated.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well-written and clear in presentation. However, it lacks methodological novelty as PCME++ was previously proposed, and the authors don’t introduce an appropriate way of handling probabilistic ECHO features. The authors need to address these concerns during rebuttal.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have responded to most of my concerns during the rebuttal. Despite the simplified assumptions regarding ECHO frames, the proposed EchoingECG framework—which establishes a connection between ECHO and ECG—is novel and potentially valuable to the field. I recommend acceptance.



Review #3

  • Please describe the contribution of the paper

    The authors propose a probabilistic student-teacher framework EchoingECG for processing electrocardiograms (ECGs) and echocardiograms (ECHOs) to improve performance on ECG tasks. Their work demonstrates improved performance on zero- and few-shot settings as well as fine tuning settings compared to baseline models and can capture uncertainty.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Study motivation: The paper is well motivated and accounts for both the clinical task and the difficulty of multimodal learning in healthcare settings.
    • Reproducibility: Authors give implementation details for data preprocessing, as well as hardware information. However, there are some details missing (see weaknesses section).
    • Evaluation: The authors evaluate their proposed methods with two data sourced from two different clinical sources and across several settings (e.g. zero-shot, few-shot, retrieval).
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Lack of subject population details: Table 1 does not include the prevalence of the conditions. This is particularly important given that the authors report accuracy as a metric in Table 5. Adding additional demographic information may help the paper.
    • Metrics: the authors do not give distributional measures of performance (e.g. confidence intervals) for their metrics. Additionally, relevant tests of statistical differences would benefit the strength of the paper.
    • Reproducibility: the authors do not include details regarding architectural details (e.g. the second citation for the 1D-ResNet ECG encoder mentions several different widths and depths) and the software stack (e.g. Python and PyTorch for model training). Often, the authors would add a citation for their design decision; however, they should be explicitly including these details to the paper with the citation. Clarifying these details and including a code release would contribute to the reproducibility of the paper.
    • Details on hyperparameter selection: The selection of hyperparameters (particularly for the $\lambda$ for the loss function) seem arbitrary or in the case of the baseline models, missing. If the authors should include the values of and their reasoning for their hyperparameters for both their proposed method and baseline models.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors’ proposed method is interesting and balances both the clinical and computational needs associated with multimodal learning. Their experiments demonstrated good performance; however, the lack of details regarding the subject population, implementation, and justification of design decisions limit the reproducibility of the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #4

  • Please describe the contribution of the paper

    The paper introduces a probabilistic student-teacher framework named EchoingECG, which integrates probabilistic cross-modal embeddings (PCME++) and the ECHO-CLIP model to distill ECHO knowledge into ECG representations, thereby improving ECG-based cardiac function prediction. This method not only captures the uncertainty in ECG signals but also reduces the dependence on large-scale annotated datasets. It outperforms existing ECG foundation models in zero-shot, few-shot, and fine-tune settings. Moreover, variance estimation enables the identification of uncertain regions within ECGs, enhancing the understanding of model performance.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Methodological Innovation:The study introduces a novel probabilistic cross-modal learning framework, EchoingECG, which integrates PCME++ and the ECHO-CLIP model to capture the uncertainty in ECG signals and map them to ECHO indicators. This approach not only addresses the complexity and variability of ECG signals but also overcomes the limitations of traditional contrastive learning by enabling many-to-many mappings in a probabilistic embedding space. Clinical Application Potential:By distilling ECHO knowledge into ECG representations, this method can facilitate cardiac function assessment in resource-limited settings, where echocardiography equipment or expertise may be unavailable. For instance, predicting ECHO indicators from ECGs can enable early screening for heart diseases in such regions. Comprehensive Experimental Validation:The authors conducted extensive experiments on the MIMIC and MUSIC datasets, demonstrating that EchoingECG outperforms existing ECG foundation models in zero-shot, few-shot, and fine-tune settings. These results highlight the robustness and generalizability of the model across different datasets and tasks.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Although the paper validates the model using the MIMIC and MUSIC datasets, these datasets may have geographical or population biases. For example, the MIMIC dataset primarily comes from hospitals in the United States and may not fully represent the characteristics of heart disease patients worldwide. Additionally, the limited number of ECG-ECHO pairs in the datasets may affect the model’s generalizability.
    2. The EchoingECG model integrates multiple complex components, such as PCME++, ECHO-CLIP, and the student-teacher framework, which may lead to complicated training and inference processes with high computational costs. In practical clinical applications, more efficient model architectures may be needed to meet real-time requirements.
    3. Although the paper proposes estimating uncertainty to identify uncertain regions in ECGs, this estimation method may be affected by model assumptions and data quality. For instance, if ECG signals contain noise or artifacts, the uncertainty estimation may not be accurate enough, thereby affecting the model’s reliability.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    In Section 3.2 “Predicting ECHO-derived Labels using ECGs,” it is recommended that the authors supplement the explanation on how to handle the impact of noise in ECG signals on model performance, in order to further enhance the completeness and persuasiveness of the paper. Additionally, in Section 3.3 “Traditional Top-K Text-ECG Retrieval,” more existing text-ECG retrieval methods could be compared to highlight the advantages of the EchoingECG model.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces the EchoingECG model, which effectively predicts ECHO indicators from ECG signals by integrating probabilistic cross-modal learning and the ECHO-CLIP model. The method demonstrates strong performance across multiple datasets and holds significant clinical application value and innovation. Although there are limitations in the datasets and model complexity, these do not affect the overall quality and contribution of the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The paper introduces EchoingECG, a probabilistic student-teacher model that leverages ECG data to predict echocardiogram-derived outcomes. The authors have addressed most of the concerns raised in the first review, particularly regarding the clarity of the model training process, the justification of hyperparameters, and the rationale behind using text as a bridge. The paper now provides a more detailed explanation of the probabilistic framework and its application, which enhances the overall understanding and reproducibility. -It would be beneficial to include additional baseline comparisons, especially with simpler models that directly use ECG features to predict ECHO-related outcomes, to further justify the complexity of the proposed model. -The authors could elaborate more on how the model handles noise in ECG signals Overall, the paper is well-written, and the authors have made substantial improvements based on the initial feedback. The contributions are significant, and the paper is now ready for acceptance.




Author Feedback

We appreciate the reviewers’ thoughtful comments and have grouped the responses below.

ECHO Independence Assumption [R#1,2]: We acknowledge the simplification in treating ECHO frames as independent. This assumption enabled tractable modeling via pretrained frame-level embeddings from ECHO-CLIP, allowing us to represent each ECHO as a multivariate distribution \mathcal{N}(\mu, \sigma^2). While this approach does not explicitly model temporal dependencies, it does capture inter-frame variation across the cardiac cycle. I.e., low-LVEF cases typically show reduced movement, leading to lower σ2, compared to high-LVEF cases with more movement & higher σ2. By relying on a foundation ECHO teacher & this assumption, our novelty is in using PCME++ to enable knowledge distillation from a limited set of ECHO-ECG pairs, which is reflected in our superior ECG to ECHO label results (Tab. 2 & 3).

10sec ECG Rationale [R#2]: We apologize for the confusion about why ECGs were 10sec. Most public ECG datasets are 10 seconds long, capturing a wide range of pathologies.

Demographics [R#3,4]: While we are too space-limited to include demographic tables, we now explicitly refer to the published demographics papers for each public dataset.

Generalizability [R#3,4]: We appreciate the concerns regarding generalizability. In our zero-shot (ZS) results for MUSIC, this is strictly external testing, supporting out-of-distribution generalization. We also aim to conduct future clinical validation.

Reproducibility [R#1-4]: We will release the model weights & code (based in Python and PyTorch) upon acceptance for transparency and reproducibility.

Baselines [R#1,3]: We compared EchoingECG with publicly released ECG foundation models trained using their provided weights for comparison; we did not train the baseline models. Other models using ECG to predict ECHO outcomes do exist, but their weights are not publicly available, making it infeasible to assess their models appropriately.

Architectural clarity [R#1-3]: We apologize for not making this clear & have updated the revised manuscript accordingly. We intentionally used standard architectures to isolate & show the contribution of our training strategy—namely, knowledge distillation via PCME++ across modalities. For EchoingECG, we followed a conventional ResNet1D, & the BioBERT & ECHO-CLIP were from Huggingface. We added more details in the paper & will release the codebase. Introducing new architectures would have added confounding factors & complexity to an already multi-component system. It is, however, an important area for future work.

Model inference [R#4]: In EchoingECG, while training is a multi-stage process, inference is modular. This means that the ECG encoder can be deployed independently of the others–supporting lightweight use in clinical settings.

Text training [R#1]: Text alignment plays a role in aligning ECG & ECHO via shared supervision, as ECHO-CLIP was trained on ECHO-text pairs. This bridge not only improves alignment but also enables ZS capabilities & future integration with LLM-based generative frameworks. ZS likely outperformed FS (frozen encoder w/ linear layer) & FT (unfrozen) due to limited training data. We expect FS & FT to perform better with more data. However, this also shows the need for text-binding: facilitate probing of EchoingECG across various tasks.

Loss function choice [R#1,3]: Our training employs a probabilistic contrastive loss using CSD (PCME++). PCME++ uses this distance (similarity measure) in InfoNCE loss; we added this detail & formulation in the paper. The novelty here lies in our practical application of a frame-by-frame deterministic model (ECHO-CLIP) to an ECG model in a probabilistic space, utilizing very limited ECG-ECHO data. The choice of λ=0.9 balances the probabilistic contrastive loss with auxiliary objectives (ECG-ECHO). We noted similar ECHO performance when λ was lower; however, there was a loss in ECG-text retrieval.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top