Abstract

Recent advancements in non-invasive detection of Cardiac Hemodynamic Instability (CHDI) primarily focus on applying machine learning techniques to a single data modality, e.g. cardiac magnetic resonance imaging (MRI). Despite their potential, these approaches often fall short especially when the size of labeled patient data is limited, a common challenge in the medical domain. Furthermore, only a few studies have explored multimodal methods to study CHDI, which mostly rely on costly modalities such as cardiac MRI and echocardiogram. In response to these limitations, we propose a novel multimodal variational autoencoder (CardioVAE_X, G) to integrate low-cost chest X-ray (CXR) and electrocardiogram (ECG) modalities with pre-training on a large unlabeled dataset. Specifically, CardioVAE_X, G introduces a novel tri-stream pre-training strategy to learn both shared and modality-specific features, thus enabling fine-tuning with both unimodal and multimodal datasets. We pre-train CardioVAE_X, G on a large, unlabeled dataset of 50,982 subjects from a subset of MIMIC database and then fine-tune the pre-trained model on a labeled dataset of 795 subjects from the ASPIRE registry. Comprehensive evaluations against existing methods show that CardioVAE_X, G offers promising performance (AUROC = 0.79 and Accuracy = 0.77), representing a significant step forward in non-invasive prediction of CHDI. Our model also excels in producing fine interpretations of predictions directly associated with clinical features, thereby supporting clinical decision-making.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1503_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/Shef-AIRE/AI4Cardiothoracic-CardioVAE

Link to the Dataset(s)

https://physionet.org/content/mimic-iv-ecg/1.0/ https://physionet.org/content/mimic-cxr/2.0.0/

BibTex

@InProceedings{Suv_Multimodal_MICCAI2024,
        author = { Suvon, Mohammod N. I. and Tripathi, Prasun C. and Fan, Wenrui and Zhou, Shuo and Liu, Xianyuan and Alabed, Samer and Osmani, Venet and Swift, Andrew J. and Chen, Chen and Lu, Haiping},
        title = { { Multimodal Variational Autoencoder for Low-cost Cardiac Hemodynamics Instability Detection } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15001},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a multi-modality framework for learning shared and modality-specific representations from chest X-ray (CXR) and electrocardiogram (ECG). The learned encoder and representations, both unimodal and multi-modal, from the unlabeled data (pre-training) can then be used to fine-tune the unimodal and multi-modal classifiers of Pulmonary Artery Wedge Pressure (PAWP), respectively. Experimental results demonstrate that the proposed methods improve the performance compared with baseline methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) For non-invasive detection of cardiac hemodynamic instability, the proposed framework investigates the use of low-cost cardiac measurements such as CXR and ECG. These measurements are more accessible and affordable, potentially benefiting real-world clinical practices. (2) A wide range of baseline methods is considered in the experiments, making the improved performance of the proposed methods convincing.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The description of the latent variable z is not clear. The authors are encouraged to add subscripts to z to clearly indicate whether it is the modality-specific z or the combined z. Additionally, regarding Equation (3), it is unclear whether the decoder uses the combined z. If so, should the expectations of the two log likelihood be computed over the distribution of the combined z? (2) The evaluation of the benefits of the Tri-stream model is limited. The authors are encouraged to confirm whether the baselines [18] and [30] used pre-training strategies as well. If not, it would be difficult for readers to determine whether the improvement of the proposed methods is due to pre-training, the Tri-stream model, or both.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) It appears that the motivation for the pre-training strategy in this paper is similar to that of semi-supervised learning, which involves using unlabeled data to assist with tasks on labeled data. The authors are encouraged to investigate and further develop the proposed methods in a semi-supervised learning manner. (2) An ablation study of the proposed methods can further strengthen this work in the future by separately discovering the effects of pre-training and the Tri-stream model.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The overall contribution of this paper is borderline; clearer and more detailed descriptions of the methods and experimental settings can strengthen the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This work proposes a cardiac multimodal variational autoencoder that predicts Pulmonary Artery Wedge Pressure (PAWP) based on paired low-cost modalities data of chest X-ray (CXR) and ECG. The main novelty point is proposing a tri-stream pre-training strategy to learn both shared and modality-specific features and therefore enabling fine-tuning with both unimodal and multimodal datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Multimodal model is usually favorable coz it can integrate the different features from each modality towards better overall performance.
    2. The unsupervised pretraining is good coz of the limited availability of labeled data.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The novelty point, and results improvement are limited compared to the cited references in the paper.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Major:

    1. I understand the main novelty point is the tri-stream pre-training where the model can learn from each modality data then from pair-data and this is the reason behind the claimed performance improvement in Table 2 (multimodal rows). However, what is the claimed adv of the proposed model when dealing with the unimodal data? Why is it giving better performance when using CXR only and ECG only while the model should be a regular one similar to the ones in the literature and without the claimed adv of the tri-stream pre-training?

    Minor:

    1. Reference citations are not ordered properly. For ex: in the first paragraph, the citations are [3, 26, 27], [15, 16, 8] and [22, 23] while ref #1 appears for the first time in Page 5. Also, some references are missing their complete information. e.g. ref 5.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea is neat and worthy of exploring (despite the limited novelty). The methods are well laid down and results are well presented and discussed.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose a low-cost, non-invasive Pulmonary Artery Wedge Pressure (PAWP) prediction method that uses chest X-rays (CXR) and electrocardiogram ECG modality information instead of the more expensive cardiac magnetic resonance imaging (MRI). The method uses a multi-modal VAE model to extract and integrate the multi-modal latent space representation, pre-training the model through unsupervised reconstruction tasks, and fine-tuning the cardiac hemodynamic instability (CHDI) detection task on unimodal and multimodal data sets to effectively improve performance, and visually display the detection results to improve the model interpretability.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The study addressess a clinically important problem, based on previous work, it was found that CXR and ECG modal information have the potential to predict PAWP. The approach combines those two low-cost modal informations to achieve PAWP prediction performance that is competitive with high-cost MRI.

    A third-stream unsupervised pre-training method enables the model to support multiple modalities and can flexibly respond to real clinical scenarios.

    By comparing the prediction effects based on a certain single modality and multi-modality, The approach improves the detection performance index of AUROC and accuracy. These comparisons fully demonstrate the effectiveness of the multi-modal integration method and unsupervised pre-training.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Considering the distribution of Low PAWP and High PAWP in the data set, 560:235, if the model predicts all samples as Low PAWP classes, the accuracy is already 0.704. Therefore, in terms of accuracy, although compared to 0.727 of single modality, there is a significant improvement (0.772) for the multi-modal integration method, there is still a large gap before clinical application of the binary classification task.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors did not mention or include any link to a repository in the manuscript, but relevant code will be publicly available, as stated in the reproducibility check. Uploading the trained models may improve the reproducibility and transparency of the manuscript.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In the multi-modal integration part, two Gaussian distributions are directly used for simple integration. Considering the heterogeneity between different modal data, have you tried a better integration method?

    When fine-tuning downstream tasks of the model, it is recommended that parameters of the backbone network participate in fine-tuning. Generally, fine-tuning all parameters will result in higher performance.

    While the binary cross-entropy loss is one of the most widely used approaches, it is recommended that using focal loss to enforce the model to handle complex cases.

    The authors only use two modalities, CXR and ECG, and the description of multi-modality in the article is inaccurate.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The motivation of this study is clear. It proposes a method that combines low-cost CXR and ECG modality information to predict PAWP. Compared with high-cost MRI modalities, it has competitive performance and demonstrates the potential application in low-income countries, which aligns with this year’s MICCAI theme. At the same time, a third-stream pipeline is also used to design simple and effective self-supervised pre-training tasks, enabling the same model to support multiple modalities, which is more in line with clinical application scenarios. Although the results show numerical superiority, their significance requires further analysis and there are still some gaps before real clinical application, The paper has potential, but addressing these issues is necessary for more stronger recommendation.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

First, we sincerely appreciate the insights from all reviewers, who have acknowledged several key strengths of our multimodal learning on Low-Cost cardiac hemodynamics instability detection, incl. technical soundness (R1), data efficiency (R1), clinical usefulness (R3, R4) and comprehensive comparison with a wide range of baselines (R3, R4). More importantly, Both R3 and R4 agree on the novelty and benefits of using non-invasive, low-cost modalities (Chest X-rays and ECGs) instead of high-cost MRIs or invasive catheter for pulmonary artery wedge pressure (PAWP). Such an approach has not been explored before, with its greater practical value to be deployed in low-income countries and medical resource-limited areas. Our research is timely, and also aligns well with MICCAI’s new theme on health equity.

We are also fortunate to receive constructive comments provided by all three reviewers to further improve the clarity and quality of the paper. We have grouped them with our response below.

#Confusion and typos R1 asks the advantage of the proposed model with unimodal data. We would like to clarify that a key contribution of our paper is its flexibility to support both unimodal and multimodal fine-tuning. Table 2 (Page 7) demonstrates that our method achieves AUROC scores of 0.681, 0.744 for unimodal CXR, ECG downstream tasks, respectively, and outperforms other unimodal non-pretrained and pre-trained baselines. In Table 2, “unimodal” and “multimodal” refer to the downstream dataset, not the pre-trained model. We will revise Table 2 in our camera-ready paper for better understanding.

R3 asks for the description of the latent variable z in the decoder. We apologize for the typo. It should be “latent space Z” rather than “latent variable z” in the ‘Decoder Design’ section on page 4. To make it clearer, we will add subscripts under the latent variable z, such as z_CXR for the CXR modality-specific, z_ECG for the ECG modality-specific, and z_(CXR, ECG) for the CXR, ECG shared. These changes will be made in both in-text and equations in the revised camera-ready paper. For the unimodal streams, the decoder uses their respective modality-specific latent variables, but for the multimodal stream, the decoder uses both modality-specific and shared latent variables for reconstruction.

#Recommendations We will fix the order of references with complete information (R1) in our final version. We also acknowledge the recommendations, such as 1) investigation of semi-supervised learning for better performance (R3); 2) exploration of more advanced multimodal integration methods (R4), and 3) exploration of focal loss for handling complex cases (R4). We will consider these recommendations in our future work.

#Code Availability and Reproducibility We will publish our code with a link to the source code and pre-trained models in our final camera-ready version, following the guidelines of the reproducibility checklist.




Meta-Review

Meta-review not available, early accepted paper.



back to top