Abstract

Cardiovascular diseases are the leading cause of death worldwide, and accurate diagnostic tools are crucial for their early detection and treatment. Electrocardiograms (ECG) offer a non-invasive and widely accessible diagnostic method. Despite their convenience, they are limited in providing in-depth cardiovascular information. On the other hand, Cardiac Magnetic Resonance Imaging (CMR) can reveal detailed structural and functional heart information; however, it is costly and not widely accessible. This study aims to bridge this gap through a contrastive learning framework that deeply integrates ECG data with insights from CMR, allowing the extraction of cardiovascular information solely from ECG. We developed an innovative contrastive learning algorithm trained on a large-scale paired ECG and CMR dataset, enabling ECG data to map onto the feature space of CMR data. Experimental results demonstrate that our method significantly improves the accuracy of cardiovascular disease diagnosis using only ECG data. Furthermore, our approach enhances the correlation coefficient for predicting cardiac traits from ECG, revealing potential connections between ECG and CMR. This study not only proves the effectiveness of contrastive learning in cross-modal medical image analysis but also offers a low-cost, efficient way to leverage existing ECG equipment for a deeper understanding of cardiovascular health conditions. Our code is available at https://github.com/Yukui-1999/ECCL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1010_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1010_supp.pdf

Link to the Code Repository

https://github.com/Yukui-1999/ECCL

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Din_CrossModality_MICCAI2024,
        author = { Ding, Zhengyao and Hu, Yujian and Li, Ziyu and Zhang, Hongkun and Wu, Fei and Xiang, Yilang and Li, Tian and Liu, Ziyi and Chu, Xuesen and Huang, Zhengxing},
        title = { { Cross-Modality Cardiac Insight Transfer: A Contrastive Learning Approach to Enrich ECG with CMR Features } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    Authors propose a contrastive feature alignment of two related sensor modalities: Cardiac MR (CMR) and electrocardiography (ECG). Both are feature embedded using custom-trained transformers (ViT for ECG, Swin-ViT for CMR). The two feature spaces are aligned using contrastive learning. The hypothesis is that the alignment with CMR features allows the ECG ViT to extract more meaningful representations, essentially making CMR obsolete. The experiments and results suggest that this does happen, but only to a certain degree. It is also not fully understood which features from CMR did “seep back” into the ECG ViT, and how this helped with classification.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Great dataset - 41.519 samples from UK Biobank (UKBB), with matching CMR/ECG modalities. A rare opportunity to perform such a study.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness is that the method itself (ViT/Swin-ViT embedding, and constrastive alignment) is not novel, only the application to this particular modality pair (ECG/CMR) and dataset is. To me, this would still be fine, but then I’d expect some more analysis of the results, e.g. a more in-depth analysis of the latent spaces (see point 10/Soundness/Interpretability).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Perhaps the authors can comment briefly (1-2 sentences) whether and how it is possible to get access to the data from UKBB (prerequisites, license etc). Even though the method seems straightforward, I do recommend publishing the code along the paper (ideally including a reproducer for the dataset preparation and pre-processing), should the paper get accepted.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    :: Novelty ::

    • Latent alignment via contrastive learning is a relatively common approach. The backbones (ViT/Swin-ViT) are also well-known. As such, this paper does not have much methodological novelty. The main novelty lies in applying it to a new dataset and modality pair, i.e. I consider the paper rather translational.

    :: Soundness ::

    • Suppl.Mat., Fig.1: correlation coefficients should always be accompanied by an indication of the p-value (e.g. =p<0.05, **=p<0.01, **=p<0.001).
    • s2.1: For future journal extensions, it could be interesting to pre-train the CMR Swin-Transformer not only via classification/regression of biomarkers/radiomics, but also via segmentation (which in return could also improve the prediction of biomarkers/radiomics). A suitable dataset could be eg. Sunnybrook Cardiac MRI (https://www.kaggle.com/datasets/salikhussaini49/sunnybrook-cardiac-mri).
    • Interpretability: the goal is to “replace” CMR purely with ECG, but it is unclear what the ECG encoder eventually learned. The t-SNE panels in Fig 2a) only show that latents after contrastive learning (by the way, nowadays there is no point in choosing t-SNE over UMAP - or, in other words, I would always recommend using UMAP instead of t-SNE). But it doesn’t illustrate which kind of CMR information got “absorbed” into the ECG modality encoder. Would be interesting to see the t-SNE/UMAP space colored by disease classes, or colored by key bioparameters (“LV end-diastolic volumes” etc), or colored by subject-correspondence (i.e. are the two spaces actually aligned?). I also would be very curious to see some analyses whether CMR features correspond with the latent distribution of the ECG features, after the two spaces have been aligned. Did the ECG signal curves contain some subtle charactistics which were now highlighted by the alignment with CMR? Can visual features from CMR be backpropagated into ECG curves? I am not an expert in either cardiac imaging nor cardiac ECG curves - but perhaps there are ECG curve features which 100% correlate with very specific visual features in CMR. Wouldn’t it be interesting to be able to show this correspondence after contrastive alignment? These are the kinds of analyses that this dataset and the approach would allow to do - especially since the method itself is not very novel, I would recommend exploring them.
    • There are 82 phenotype parameters from CMR and probably other phenotypes in ECG as well. I would recommend looking into feature disentanglement of the two latent spaces, perhaps it is possible to disentangle and/or align the 128-dim latents with some of these features.

    :: Clarity ::

    • s2.1: It is not fully clear how the ECG ViT is designed. Fig 1 (left-top panel) suggests that the authors used a 2D-ViT (as seen in the random mask), as if the ECG was an image? This, of course, would be sub-optimal (image resolution, white background patches etc etc). Instead, the ECG should be embedded via a 1D-ViT (i.e. a temporal transformer model). The authors write that the ECG signal is of dimension CxT, i.e. a multi-channel 1D signal, which indicates the correct approach. But overall, this is confusing, mostly because of the random-mask illustration in Fig. 1.
    • If the ECG ViT is 1D indeed, what makes it a “ViT”? Are patches embedded before input into the ViT, or are they just fed in as-is?
    • Were any positional embeddings used for the ECG 1D-ViT or the CMR 2D-ViT? If not, why? Please be more elaborate when explaining the architecture of the overall architecture, as well as the input branches.

    :: Minor comments ::

    • Fig.1: Left-top panel, here the random mask picture suggests that ECG data is interpreted as a 2D image, which would be confusing (see points above).
    • Fig.1: Left-bottom panel reads “Swim Transformer”
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is not very novel - usually not a deal breaker for me, as long as the authors can make up for it with a highly innovative analysis of results. For a potential resubmission and/or journal extension, I would recommend to explain the architecture better. More importantly, explore latent feature spaces in more detail, perhaps using some of the suggestions pointed out in point 10./Clarity.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have partially addressed my main concern, i.e. limited interpretability of results. Such results do exist, and could be provided: authors, please do report UMAP results classified by disease or cardiac metrics in the revised version, if the space of 10 pages is exceeded, consider adding it as supplementary material.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a contrastive learning framework to map ECG features into the latent space of CMRs, with the goal of enabling classification and regression tasks normally done with CMRs, but using ECGs alone. The ECG and CMR features are encoded by a ViT and Swin transformer, respectively, followed by contrastive learning to map the ECG features into the frozen CMR latent space, and finally supervised fine-tuning on the ECGs features. Results are shown on the diagnostic performance of several heart diseases and 82 cardiac phenotype indicators.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The goal to extract more clinical utility from ECGs has good clinical motivation and potentially large impact, given the wide accessibility and lower cost of ECGs compared to CMRs.
    • The core methods (ViT, Swin, CLIP) are well-established and the overall framework is fairly simple.
    • The paper is well-written and the methodology is largely described clearly (one exception noted below).
    • A very large data set (40k+) is used for this study.
    • The diagnostic results show promise for clinical use.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It is unclear how the supervised training on ECG data for specific tasks is done. The loss function (Eqn. 4) includes only CMR features. Is the equation correct? If the CMR features are used for supervised training of the ECGs, what would be the results of supervised training using only the ECGs to perform the specific tasks?
    • The comparisons should include at least one traditional method (e.g., CNN fusion layer) for fusing the ECG and CMR features.
    • Only AUC and Pearson correlation are presented as evaluation metrics.
    • No statistical significance testing was done.
    • The regression results are averaged over 82 outcomes, which are hard interpret.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The main point that is unclear is how the supervised training on ECG data for specific tasks is done. Otherwise, the paper is quite clear.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Please clarify how the supervised training on ECG data for specific tasks is done, and if needed add results for supervised training only on ECG data.
    • Add a comparison to a basic method for multi-modal fusion.
    • Perform statistical testing to indicate which differences are significant.
    • Breakdown the regression results into meaningful categories (e.g., functional vs. anatomical). The supplement contains results for each phenotype, but some intermediate level of categorization and aggregation for the main paper would be helpful.
    • In Table 1, in the MI AUC column, the triplet approach has the highest value and should be in red.
    • In Fig. 1, please correct “Vit” to “ViT” and “Swim” to “Swin”.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, this paper proposes a useful method for jointly modeling ECG and CMR features. It is unclear on how the framework compares to more established methods of supervised training on ECGs and fusion of ECG and CMR features, but if these points are clarified, the paper could be very impactful.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The rebuttal has clarified some of the confusion in the original manuscript. While it lacks a comparison with a basic feature fusion method, it should be of sufficient interest to the MICCAI audience.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a multi-modality (ECG and CMR) data fusion method to improve diagnostic accuracy of ECG that is a more affordable modality for assessment of cardiovascular health.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -The authors propose a multimodality data fusion methodology, that uses features from a more comprehensive/expensive cardiac exam (CMR) to boost the predictive power of machine learning models on future data gathered using only the less powerful sensing modality ECG in clinical settings. -The authors provide extensive experimental evidence to show that each proposed component in their framework has contributed to the performance improvement

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -The positive cases in the UK Biobank were extremely rare (typically less than 2% of the patients) and no information regarding the patient demographics were provided -The low performance on the regression tasks indicates that the data fusion method cannot really capture mechanistic connections between cardiac parameters measured across the two modalities

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper mentions use of the public dataset UK Biobank that requires registration by researchers to gain access to the patient data. There is no mention of a code release however, there is adequate information for a sufficiently skilled researcher to reproduce the work with some nontrivial effort.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I thank the authors for their valuable work in early detection and treatment of cardiovascular diseases using more affordable available sensing technologies by transferring insights from more comprehensive but not always easily available modalities. I would like to see more work in understanding/distilling mechanistic insights into the complex machine learning / deep learning models and not solely relying on black box complex models (this is not a specific comment to these authors but to the field as a whole).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I would like to see methods providing more insights into the mechanistic understanding of the diseases and demonstrating what each modality is actually measuring and how they actually relate and complement each other, and furthermore showing more intuitive ways to distill this information across different modalities.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Thanks for the valuable comments and suggestions. Below we respond to the comments. Code: As MICCAI requirements, if our paper is accepted, we will publish the code for UKB data processing, model training, and inference. 1.To reviewer3&4&5 : Clarifications about ECG Phase 2 training and ViT model design: We sincerely apologize for the potential misunderstanding caused by a typo in Equation 4 during the second stage of training. To clarify, the correct Equation 4 should replace fc and xc with fe and xe. This loss function is specifically for training the ECG encoder. In the second stage, training involves two types of loss: the contrastive loss between ECG and CMR, and the downstream task prediction loss for the ECG encoder, which includes four cardiac diseases and CMR-related cardiac metrics. Thus, the ECG encoder(ViT) is updated under both losses, while the CMR encoder(Swin) is updated solely under the contrastive loss. And only the trained ECG encoder is used in the inference phase. Regarding the ECG ViT model, a batch of ECG data is defined as [b, 12, 5000], where “12” represents the 12 leads, and “5000” represents 10 seconds of data at 500 Hz. We reshape the data to [b, 1, 12, 5000], treating it as a 1-channel image with height 12 and width 5000. For the ViT model, we set the patch size to (1, 100), dividing the data into 600 patches, and then apply the standard ViT approach with standard positional embedding. Our results, reported in Figure 2b, show that smaller patches yield better performance. This method enables cross-lead and cross-length attention and allows for the automatic alignment of ECG and CMR patches using contrastive loss. If allowed, we will include an analysis of ECG and CMR patch correlations in the revised version for better interpretability. Additionally, Fig 1’s depiction of ECG patches might be misleading, as the actual data is a narrow, long strip image (12 x 5000), unsuitable for display. 2.To reviewer3: Results using ECGs only and method that fuses ECG and CMR: Table 1 Results with W/O CL in the ablation experiments are the supervised learning results using ECGs only, and W/ or W/O represents with or without self-supervised pre-training of ECGs. While we did not use a method that fuses ECG and CMR as a control, we report the results of supervised learning of diseases and metrics using CMR as our upper bound and approximation target in the last row of Table 1. This may provide some reference value. 3.To reviewer3&4: Regarding the issue of single evaluation metrics and the lack of statistical significance testing, we will include these in the next version if allowed. We are encouraged by the positive preliminary results. 4.To reviewer3: Individual results for each trait: In Figure 2c, we reported the regression coefficients for 10 indicators, and the full set of 82 regression coefficients can be found in the supplementary materials. Although intermediate type classification and aggregation were not performed, we believe that the individual regression coefficients can provide some helpful insights. 5.To reviewer4&5: Interpretability: Future extensions will consider using ECG to generate paired CMR images, enhancing interpretability. We have already used well-established pre-trained models for segmenting and determine the cardiac region of CMR in our data preprocessing. For interpretability, we used 10-second ECGs and 50-frame CMRs, covering both systole and diastole with temporal correlations. If permitted by MICCAI, we will report UMAP results classified by disease or cardiac metrics in the revised version. 6.To reivewer4: Research on ECG phenotypes: It is limited due to the lack of ECG phenotypic information in the UKB dataset, but we will explore this further with additional datasets in future studies. 7.To reivewer5: Positive samples in the UKB dataset are rare: As mentioned, we combine all positive cases with a random subset of negative samples to form a smaller set for iterative training.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I strongly agree R4 the the combination of ECG and CMR is quite interesting, especially on a large dataset.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I strongly agree R4 the the combination of ECG and CMR is quite interesting, especially on a large dataset.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Clear accept for me after Rebuttal.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Clear accept for me after Rebuttal.



back to top