Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Major Adverse Cardiovascular Events (MACE) remain the leading cause of mortality globally, as reported in the Global Disease Burden Study 2021. Opportunistic screening leverages data collected from routine health check-ups and multimodal data can play a key role to identify at-risk individuals. Chest X-rays (CXR) provide insights into chronic conditions contributing to major adverse cardiovascular events (MACE), while 12-lead electrocardiogram (ECG) directly assesses cardiac electrical activity and structural abnormalities. Integrating CXR and ECG could offer a more comprehensive risk assessment than conventional models, which rely on clinical scores, computed tomography (CT) measurements, or biomarkers, which may be limited by sampling bias and single modality constraints. We propose a novel predictive modeling framework - MOSCARD, multimodal causal reasoning with co-attention to align two distinct modalities and simultaneously mitigate bias and confounders in opportunistic risk estimation. Primary technical contributions are - (i) multimodal alignment of CXR with ECG guidance; (ii) integration of causal reasoning; (iii) dual back-propagation graph for de-confounding. Evaluated on internal, shift data from emergency department (ED) and external MIMIC datasets, our model outperformed single modality and state-of-the-art foundational models - AUC: 0.75, 0.83, 0.71 respectively. Proposed cost-effective opportunistic screening enables early intervention, improving patient outcomes and reducing disparities.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3125_paper.pdf

SharedIt Link: https://rdcu.be/eHwYH

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04984-1_32

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/OrchidPi/MOSCARD

Link to the Dataset(s)

N/A

BibTex

@InProceedings{PiJia_MOSCARD_MICCAI2025,
        author = { Pi, Jialu AND Farina, Juan Maria AND Lahiri, Rimita AND Jeong, Jiwoong AND Gurudu, Archana AND Park, Hyung-Bok AND Chao, Chieh-Ju AND Ayoub, Chadi AND Arsanjani, Reza AND Banerjee, Imon},
        title = { { MOSCARD - Multimodal Opportunistic Screening for Cardiovascular Adverse events with Causal Reasoning and De-confounding } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {331 -- 341}
}

Reviews

Review #1

Please describe the contribution of the paper

The authors propose a two-step multimodal framework that integrates chest X-rays (CXR) and electrocardiograms (ECG). In the first step, single-modality de-confounding encoders are trained using a confusion loss. In the second step, multimodal learning is performed using a co-attention module and two transformers with five branches. The framework is evaluated on internal datasets, shift data from the emergency department (ED), and the external MIMIC dataset.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Combination of CXR and ECG will provide complementary information for opportunistic screening for cardiovascular adverse events.
2. The paper includes relatively sufficient experimental analyses to assess the impact of various factors on single-modality and multimodal performance.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Weak Motivation and Literature Review:
  - The motivation for the study is not clearly articulated. While the paper identifies challenges such as population bias in recent vision-language models (VLMs) like MedCLIP, the targeted challenge could be more narrowly defined.
  - The literature review is insufficient, with limited discussion of related work. Only one reference on adversarial debiasing is mentioned, which does not adequately highlight current developments in the field.
2. Lack of Technical Clarity in the Methods Section:
  - The ground truth for both the classification label and the confounder label is not clearly defined.
  - The paper does not provide a detailed explanation of the confusion loss.
  - It is unclear how the authors distinguish between early and deeper layers.
  - The dimensions of variables are not explicitly stated, making the methodology harder to follow.
  - The meaning of arrows in the paper is unclear, as they sometimes represent causal relationships and sometimes has other meaning.
3. Ambiguity in Prediction Outputs:
  - In Figure 1, the network produces three prediction outputs, but it is unclear which output should be used or prioritized.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper lacks a clear motivation and sufficient literature review, provides an inconsistent and unclear methodology with missing technical details (e.g., confusion loss, variable dimensions, and ambiguous use of arrows), and fails to clarify key aspects such as prediction outputs, making it difficult to follow and evaluate.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have strengthened their motivations in the rebuttal and provided more reasoning for their architectures. They have addressed my primary concerns. However, the clarity and quality of the writing could still be improved.

Review #2

Please describe the contribution of the paper

The main contribution of the paper is the development of the MOSCARD framework, which leverages multimodal causal reasoning and co-attention mechanisms to predict major adverse cardiovascular events (MACE) more accurately and fairly. By integrating chest X-ray (CXR) and 12-lead electrocardiogram (ECG) data, the framework addresses the challenges of bias and confounding factors through novel techniques like de-confounding and causal intervention.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper’s major strengths lie in its innovative MOSCARD framework, which integrates multimodal causal reasoning with co-attention mechanisms to improve the prediction of major adverse cardiovascular events (MACE). This novel methodology effectively addresses bias and confounding factors by incorporating causal intervention and de-confounding techniques, ensuring that the model’s predictions are more accurate and generalizable. The clinical feasibility of the framework is demonstrated through robust evaluations on diverse datasets, including emergency department (ED) and MIMIC-IV data, showcasing its ability to work with routine health data like CXR and ECG for opportunistic screening. Additionally, the model’s superior performance in comparison to existing methods, along with its ability to address population bias, highlights its significant potential to improve cardiovascular risk assessment and reduce health disparities in real-world settings.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Although MOSCARD incorporates causal reasoning and de-confounding to improve fairness and generalization, its architecture is highly complex—requiring two-stage training, multiple loss functions, and co-attention modules. This complexity may hinder clinical interpretability, as clinicians may not easily understand how the model reaches decisions or which causal pathways contribute to a given prediction.
2. The model relies on high-quality, well-aligned pairs of CXR and ECG data. In real-world settings, these modalities are not always acquired concurrently, and issues such as temporal misalignment, noise, or missing data are common. The paper does not adequately address how the model would handle incomplete or low-quality inputs.
3. While the co-attention mechanism uses ECG as the query to guide attention over CXR features, the rationale for this directionality is not fully explained. Moreover, since ECG and CXR do not share a spatial correspondence, the attention learned may reflect statistical co-occurrence rather than medically meaningful associations.
4. Although the model is evaluated on external datasets (e.g., MIMIC and ED cohorts), these are relatively homogeneous and limited in size. There is insufficient validation across geographically diverse, low-resource, or demographically varied populations, leaving questions about the model’s global generalizability.
5. The paper does not explore how changes in hyperparameters (e.g., loss weights), the number of confounders, or random seeds affect performance. Given the model’s complexity and multi-branch structure, its robustness to training instability or parameter variation remains uncertain, posing a risk for deployment in clinical settings.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The MOSCARD framework introduces a novel approach to multimodal causal reasoning and co-attention mechanisms, effectively addressing key challenges like bias, confounding, and modality alignment in cardiovascular risk prediction. The paper also demonstrates a strong evaluation across multiple datasets, including both internal and external data, and shows promising clinical feasibility for opportunistic screening in resource-limited settings. However, the lack of immediate access to source code and data diminishes the reproducibility of the results, and the discussion on the potential limitations of the model in real-world clinical applications could have been more thorough. Additionally, the paper’s reliance on the quality of CXR and ECG data could impact its applicability in settings with noisy or incomplete data.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The paper proposed MOSCARD, a novel multimodal framework for opportunistic screening of Major Adverse Cardiovascular Events (MACE). The main contributions include: integrating features from ECG data to CXR images; using confusion loss to mitigate the confounder effect.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper proposed a directional co-attention mechanism where ECG features guide CXR feature refinement
2. This paper combines Structural Causal Models (SCMs) and deconfounding into model.
3. This paper show a lot experiments to evaluate proposed model across diverse cohorts.
4. This paper applies confusion loss to multimodal features.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. This paper’s writing quality should be improved. For example, Fig.1 could be more informative and more description are needed. In addition, Table 1 and Table 2 could be better organized.
2. Results section is not solid enough. More ablation studies need to be included. For exmaple, the paper does not systematically evaluate the individual contributions of its core components
3. The clarity and organization of this paper should be improved. There is no Conclusion section.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My overall score is based on the major strengths and major weaknesses.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Based on author’s rebuttal, my concerns are addressed. So I accept this paper.

Author Feedback

We thank the reviewers for helpful comments. We grouped comments and addressed them here.

We address the weak motivation raised by the reviewers: Our study contributes both clinically and technically. Clinically, we introduce a novel use case for secondary use of ECG and chest X-ray data for opportunistic cardiovascular screening during routine check-ups—enabling early risk identification without relying on dedicated cardiac visits. This supports timely intervention, especially in settings lacking advanced imaging (e.g., echo, CT, MRI). Technically, we present a generalizable multimodal prognostic framework tested on diverse, unseen populations. Unlike prior work, we explicitly model confounders and causal structure, using comorbidity-aware fusion and confusion loss to debias demographics—improving robustness across heterogeneous cohorts. We will clarify the motivation in the revised paper.

Comments on highly complex architecture which hinders interpretability: A complex architecture is essential for addressing the challenges of multimodal ECG and X-ray fusion while also ensuring generalizability. While we acknowledge that such complexity may hinder interpretability, we propose utilizing custom image-based saliency maps to gain insights into the model’s decision-making process throughout its end-to-end reasoning.

Comments on temporal misalignment, noise and missing data: Our model consists of three output branches: one for ECG, one for CXR, and one for the concatenated features of both modalities after co-attention. This architecture ensures the model remains functional even if one modality is misaligned or of low quality, allowing it to rely on the available modality for prediction. Notably, the multimodal framework, guided by ECG, not only boosts overall performance but also enhances ECG performance following co-attention. Additionally, our training process is divided into two steps. If one modality is unavailable, the model can still be evaluated using only the first step, which includes confounder debiasing.

Issues with Co-attention using ECG: The co-attention mechanism in our model is not designed to capture pixel-level correspondence but rather to emphasize the relevance of cross-modal features. It identifies latent features that are correlated and tend to co-occur in specific disease conditions. To explore spatial co-attention, we used the MedCLIP and ALBEF approach as a baseline, projecting ECG and CXR into a shared embedding space. However, we found that the resulting feature distributions (even with distillation) exhibited limited variance, which proved insufficient for downstream tasks.

Global generalizability challenge: We selected the MIMIC and ED cohorts as external validation datasets due to their significant population differences. The ED cohort primarily includes healthier, lower-risk patients, while the MIMIC cohort comprises ICU patients with more complex conditions. Although detailed demographic comparisons were omitted due to page limitations, these datasets also differ in age and gender distributions. Our internal dataset includes a higher proportion of older patients (21%) and a greater number of males, whereas the ED cohort has fewer older patients (8%), and the MIMIC cohort exhibits a more balanced distribution across both age and gender. By evaluating our model across these three distinct datasets—with varying MACE risk levels, comorbidity and demographic profiles—we aim to demonstrate its robustness and generalizability. Due to the limited availability of open-source global datasets with MACE labels, we were unable to evaluate the model’s generalizability beyond the U.S. healthcare setting. We will acknowledge this as a limitation.

Writing clarity – improve organization: Due to space constraints, detailed ablation studies will be provided in our anonymous GitHub repository. We also plan to enhance clarity in the methods and add a conclusion section in the revised manuscript.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

There is difference of opinion on this paper, but all reviewers provided thoughtful insights, attention to which would markedly improve the paper.
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

This paper proposes MOSCARD, a multimodal framework for cardiovascular risk prediction that integrates causal reasoning, confounder debiasing, and co-attention between ECG and chest X-ray. Reviewers 2 and 3 recommend acceptance, and Reviewer 1 leans positive, citing the method’s novelty and strong evaluation across diverse internal and external cohorts. The authors address key challenges such as modality alignment, missing data, and demographic bias, and clarify the model’s flexibility and robustness in the rebuttal. While there is room for improvement in writing clarity and external generalisability, the technical and clinical contributions are well-founded.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

MOSCARD - Multimodal Opportunistic Screening for Cardiovascular Adverse events with Causal Reasoning and De-confounding

Author(s):