Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

An electrocardiogram (ECG) is a widely used, cost-effective tool for detecting electrical abnormalities in the heart. However, it cannot directly measure functional parameters, such as ventricular volumes and ejection fraction, which are crucial for assessing cardiac function. Cardiac magnetic resonance (CMR) is the gold standard for these measurements, providing detailed structural and functional insights, but is expensive and less accessible. To bridge this gap, we propose PTACL (Patient and Temporal Alignment Contrastive Learning), a multimodal contrastive learning framework that enhances ECG representations by integrating spatio-temporal information from CMR. PTACL uses global patient-level contrastive loss and local temporal-level contrastive loss. The global loss aligns patient-level representations by pulling ECG and CMR embeddings from the same patient closer together, while pushing apart embeddings from different patients. Local loss enforces fine-grained temporal alignment within each patient by contrasting encoded ECG segments with corresponding encoded CMR frames. This approach enriches ECG representations with diagnostic information beyond electrical activity and transfers more insights between modalities than global alignment alone, all without introducing new learnable weights. We evaluate PTACL on paired ECG-CMR data from 27,951 subjects in the UK Biobank. Compared to baseline approaches, PTACL achieves better performance in two clinically relevant tasks: (1) retrieving patients with similar cardiac phenotypes and (2) predicting CMR-derived cardiac function parameters, such as ventricular volumes and ejection fraction. Our results highlight the potential of PTACL to enhance non-invasive cardiac diagnostics using ECG. The code is available at: https://github.com/alsalivan/ecgcmr

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1440_paper.pdf

SharedIt Link: https://rdcu.be/eHwK5

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04927-8_21

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/alsalivan/ecgcmr

Link to the Dataset(s)

https://www.ukbiobank.ac.uk/

BibTex

@InProceedings{SelAle_Global_MICCAI2025,
        author = { Selivanov, Alexander AND Müller, Philip AND Turgut, Özgün AND Stolt-Ansó, Nil AND Rueckert, Daniel},
        title = { { Global and Local Contrastive Learning for Joint Representations from Cardiac MRI and ECG } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {217 -- 227}
}

Reviews

Review #1

Please describe the contribution of the paper

The author introduces a dual-loss strategy: a global contrastive loss aligns patient-level ECG and CMR embeddings, while a local contrastive loss enhances temporal alignment between ECG segments and corresponding CMR frames. The global loss aligns ECG and CMR embeddings for the same patient, while the local loss enforces a finer-grained temporal alignment by contrasting encoded segments. This approach aims to enrich ECG representations with functional information derived from CMR.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Dual-loss framework: The combination of global and local contrastive losses is novel, which addresses limitations of global-only methods. The local alignment leverages temporal correspondence between ECG and CMR.
2. Parameter-free local alignment: The local loss formulation is computationally efficient, making it scalable for medical applications.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Potential to clarify temporal alignment benefits: The link between temporal alignment during training and ECG-only performance at inference (Sec. 4) could be better explained. For instance, how syncing ECG segments with CMR frames improves standalone ECG tasks isn’t immediately intuitive.
2. Dependence on paired data: The framework requires paired ECG and CMR data for the multimodal contrastive training. The availability of such large-scaled paired datasets can be a limitation for broader applicability.
3. Evaluation Against Large ECG-Only Datasets: The proposed method’s reliance on paired ECG-CMR data contrasts with the large, publicly available ECG-only datasets (e.g., MIMIC-ECG). A key weakness is the lack of comparison against strong ECG-only models trained on these large datasets. To justify the complexity and the paired-data requirement of PTACL, it is essential to demonstrate that it offers significant advantages over models that can leverage these widely accessible, large-scale ECG resources. Does the information transferred from the limited paired CMR data truly outweigh the potential benefits of training on vastly more ECG data alone?
4. Limited CMR input: The method utilizes only a single middle short-axis slice. While efficient, it’s possible that incorporating information from more views or slices could provide more richer information.
5. Generalizability: The evaluation is performed on UK Biobank. While substantial, validation on external datasets from different populations, clinical settings would be needed to establish generalizability.
6. Justification of Clinical Utility: The paper shows PTACL improves ECG’s ability to predict CMR measurements compared to standard ECG analysis. However, the performance is still significantly lower than using CMR data directly as shown in Table 2. The paper needs to better explain why this level of improvement in ECG performance is clinically valuable. For which specific situations or decisions would a doctor use this enhanced ECG analysis, knowing it’s less accurate than a CMR but more accessible? A clearer explanation of the practical use case and the clinical significance of the achieved improvement is needed.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper proposes PTACL, incorporating a novel, parameter-free local temporal alignment into ECG-CMR contrastive learning. It shows improvement over global multimodal methods and ECG-only baselines trained on the same dataset. However, key weaknesses require attention during rebuttal: clarifying the link between temporal alignment and inference performance (Weakness #1), dependence on paired data (Weakness #2), the lack of comparison against ECG models trained on large public datasets (Weakness #3), and the need for clearer justification of clinical utility (Weakness #6). Acceptance is conditional on the authors substantially addressing these points.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper proposes a multimodal contrastive learning framework trained on ECG and Cardiac Magnetic Resonance (CMR) data to enhance ECG representations. First, separate encoders were used to learn the uni-modal representations of ECG and CMR data. Then, the mutimodal learning framework was trained using a global contrastive loss to align patient-level representations as well as a local loss to align ECG segments with corresponding CMR frames associated with the same patient at the same cardiac phase. They indicated the high performance of the learnt ECG representations in similar patient retrieval and cardiac phenotype regression.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The code seems to get publicly available. 2.The local module could be integrated into the framework without introducing additional learnable parameters
2. They applied the learnt representations to multiple downstream tasks.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The authors claimed that the local module could be integrated into the framework without introducing additional learnable parameters. However, in Eq 4, beta has been used, and it’s not clear whether it’s a learnable parameter or a hyperparameter. If the former is true, the initial claim may not be correct.
2. Although some implementation details are provided, including more details such as learning rate and batch size can help improve the reproducibility of the work.
3. Furthermore, to have a more robust evaluation, it would be better to perform cross-validation to report the mean and std of the results and use an external dataset for testing the model.
4. It’s not clear whether the split of the data into training, validation, and test is based on unique patient ids. If that’s not the case, there may be potential data leakage in the experiments.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed framework is promising and can learn useful ECG representations. However, it should get clear whether data split was based on unique patient ids.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Most of my comments were accurately addressed, such as mentioning the batch size and learning rate, confirming that the data split was based on unique patient ids. I only recommend the authors to clearly mention in the paper that the results are based on five random seeds.

Review #3

Please describe the contribution of the paper

The paper proposes PTACL (Patient and Temporal Alignment Contrastive Learning), a novel multimodal self-supervised framework that jointly learns representations from ECG and cardiac MRI (CMR). The key innovation lies in combining global patient-level alignment with a local temporal alignment between ECG segments and CMR frames. This local contrastive loss captures fine-grained physiological correspondences across modalities, enabling the model to enrich ECG representations with structural and functional cardiac information — without introducing any additional learnable parameters. PTACL achieves state-of-the-art performance on clinically relevant tasks, such as cardiac phenotype retrieval and regression, and is validated on a large, real-world dataset from the UK Biobank.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) Novel Methodological Design: PTACL introduces a dual-level contrastive learning strategy — global for patient-level alignment and local for time-step correspondence. The local contrastive loss is entirely parameter-free, making it computationally efficient while significantly improving the model’s ability to capture temporal dynamics between ECG and CMR. This is a meaningful extension beyond prior work that only considers global representations.

2) Strong Clinical Relevance: The model is evaluated on 10 key cardiac phenotypes (e.g., LV/RV volumes, ejection fraction, cardiac output), demonstrating its potential for non-invasive estimation of structural and functional cardiac properties directly from ECG — a widely available and low-cost tool.

3) Large-Scale Evaluation: The study uses a paired ECG-CMR dataset from 33,942 subjects in the UK Biobank, making it one of the largest multimodal studies in this area. The large sample size supports strong generalization and provides confidence in the robustness of the results.

4) Demonstrated Clinical Feasibility: PTACL enables accurate patient retrieval based on physiological similarity and outperforms strong baselines in cardiac phenotype prediction, all using ECG enhanced with minimal CMR data. This approach could extend advanced cardiac assessment to settings without routine access to imaging.

5) Efficiency and Simplicity: Despite its strong performance, PTACL remains efficient by not introducing extra model parameters for local alignment. This makes it more practical to scale and integrate into existing clinical or deployment pipelines.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

No complaints from my side; the work is strong and well-executed.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

The model description in the “Implementation Details” section could be more appropriately placed in the Methods section to improve clarity and structure.

Additionally, including statistical analysis — such as significance testing, Pearson correlation, or other relevant metrics — would strengthen the evaluation and support the reported results.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well-motivated, methodologically sound, and thoroughly evaluated. The contributions are clear and impactful, and the clinical relevance is well demonstrated. Overall, it’s a strong and well-executed piece of work.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The work is well-executed on a large dataset and shows strong potential both methodologically and in clinical application.

Author Feedback

We thank the reviewers for their thoughtful feedback and for recognising key strengths of our work: the novelty of combining global and local contrastive learning, the efficiency of our parameter-free local loss, and the clinical relevance demonstrated on the large UK Biobank dataset. We now address the raised concerns: Clinical Utility(R2[#6]): We emphasise that our method is not intended to replace CMR, but to support clinical decision-making where CMR is unavailable, costly, or unnecessary. For example, our approach can enable ECG-based risk stratification where the enhanced ECG is used to identify patients who should be referred for MRI. This is particularly valuable in primary care, emergency settings, and low-resource environments, where it can improve early detection, prioritise referrals, and reduce unnecessary imaging. Paired Data, Generalizability(R2[#2,#5],R3[#3]): We acknowledge the limitation of requiring paired data Sec.5. While this is a common limitation in multimodal contrastive learning, we specifically leveraged the UK Biobank, one of the largest available ECG-CMR dataset, to demonstrate the effectiveness of our approach. Importantly, paired data is only needed during training; at inference, the model operates on ECG alone, enabling broad applicability. Regarding generalizability, we aimed to demonstrate the method’s effectiveness using the large and well-characterised UK Biobank dataset, whose scale and quality support robustness. While we focused on this dataset to isolate and validate our approach, we acknowledge the limitation of using a single source and will mention it in the discussion. Future work can extend evaluation to external cohorts. Temporal alignment(R2[#1]): Our goal is to predict imaging-derived phenotypes, which are defined at different cardiac phases (ED,ES) and can’t be directly inferred from ECG alone. Contrastive learning enables us to transfer structural information from CMR to ECG by aligning both modalities in a shared latent space. While global alignment captures subject-level patterns averaged across time, local alignment preserves temporally specific features, important for phenotypes tied to distinct cardiac phases. Since CMR is ECG-gated, we leverage this synchronisation to define temporally aligned ECG-CMR pairs in a supervised contrastive setting. This enriches the latent space with phase-specific features, improving phenotype prediction. ECG-Only models(R2[#3]): Foundation models trained on large ECG-only datasets (MIMIC-ECG) are typically evaluated on abnormal cardiac electrophysiologies (arrhythmias, myocardial infarction). These tasks differ from ours, which focuses on predicting imaging-derived structural phenotypes. As ECG signals reflect only electrical activity, they don’t contain the structural information required for our targets, so a direct comparison is limited. Still, PTACL outperforms an ECG-only model pretrained on UK Biobank (Table2). We also evaluated a model pretrained on MIMIC-ECG by using linear probing on UK Biobank, and it performed worse than a model pretrained directly on UK Biobank. For clarity, the MIMIC result was omitted from the table, but we are happy to include it if allowed. Implementation Details, Data Split(R1,R2[#4],R3): Full code and configurations will be made publicly available to support reproducibility. We confirm that 𝛽 in Eq.4 is a fixed hyperparameter, non-learnable (=1.0, Sec 4.3). The batch size=128, learning rate=1e-4 used during multimodal phase. The dataset was split strictly by patient IDs to avoid any data leakage. For evaluation, we report means over five random seeds (stds are minor ∼0.002, Table 2). While we also computed Pearson correlation, p-values, and conf.intervals, we reported R2 for consistency with prior work. We acknowledge that using even more CMR slices could provide richer information. We intentionally used a single middle SA slice to reduce costs and show that our method is effective even with limited input.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Global and Local Contrastive Learning for Joint Representations from Cardiac MRI and ECG

Author(s):