Abstract

Ejection fraction (EF) estimation in echocardiography is a key indicator to examine cardiac functions and to determine the optimal treatments for patients prone to heart dysfunctions, such as heart failure. Recently, machine learning has shown promising predictive performance, as diagnostic tools, to estimate EF using echocardiograms. However, most state-of-the-art models have overlooked diversity of phenotypes in echocardiography, derived from patient’s demography (e.g., sex and age). In this study, we propose a novel integrative bipartite graph neural network (IBi-GNN) that integrates demographic variables of patients with echocardiograms to improve the EF predictive performance and model interpretability in precision medicine. In the experiments, IBi-GNN significantly reduced the estimation errors compared to the benchmark models, and the significant improvement was statistically assessed. We also show that IBi-GNN is interpretable to identify interaction between the multi-modalities. The interpretation provides comprehensive understanding of the relationships between demographic factors and cardiac structures. The open-source codes are publicly available at https://github.com/datax-lab/IBi-GNN.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2250_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/datax-lab/IBi-GNN

Link to the Dataset(s)

https://echonet.github.io/pediatric/index.html#dataset

BibTex

@InProceedings{LeeSeu_Explainable_MICCAI2025,
        author = { Lee, Seungeun and Kim, Jaeyoung and Kang, Kyungtae and Kang, Mingon},
        title = { { Explainable Integrative Bipartite Graph Convolutional Neural Network for Predicting Ejection Fraction in Echocardiography } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15971},
        month = {September},
        page = {332 -- 341}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The integrative bipartite graph neural network (IBi-GNN) integrates demographic variables of patients with echocardiograms to improve the EF predictive performance and model interpretability.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A novel deep learning model, IBi-GNN, was developed to identify demography-specific visual features using bipartite graph layers that demonstrated statistically significant improvements in EF estimation from echocardiograms.

    The identified patterns appears to offer potential insights for personalized treatment in precision medicine.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    In comparison to the traditional model [16], the improvement seems marginal or not statistically significant.

    The method has only been applied to a single dataset, making it difficult to assess its generalizability.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In comparison to the traditional model [16], the improvement seems marginal or not statistically significant.

    The method has only been applied to a single dataset, making it difficult to assess its generalizability.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have addressed the concerns satisfactorily, I am happy to accept the paper.



Review #2

  • Please describe the contribution of the paper

    A novel, interpretable method for ejection fraction prediction from time-resolved 2D echocardiography that integrates three demographic variables (sex, age, BMI) by a bipartite GNN is presented. Validation is performed on a large publicly available dataset (>4400 videos from the EchoNet-Pediatric EF dataset [16]). The method is comprehensively compared with three other ‘integrative’ approaches and two imaging-only models, showing statistically significant improvements across all three evaluation metrics (MSE, MAE, and R²). The inherent model interpretability is qualitatively shown and convincingly discussed.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novelty: a novel approach for integrating video data (echocardiograms) and scalar factors (three demographic variables) for a regression task (EF prediction) is proposed
    • Model interpretability: The model is inherent interpretable. This is qualitatively shown and convincingly discussed.
    • Strong validation: evaluation on large publicly available dataset, comprehensive comparison to three other ‘integrative’ approaches and two imaging-only models, statistically significant improvements shown across all three evaluation metrics.
    • “The open-source codes are publicly available.” (abstract)
    • The main contributions of the paper are clearly stated and largely convincing (page 2, claim (3) perhaps goes a little too far).
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • claim of contribution (3) that “identified demography-specific patterns can be leveraged, as new clinical knowledge, to make personalized treatments in precision medicine“ (page 2) goes a little too far and is not substantiated by the experiments.
    • There are several minor issues that could be improved, see the additional comments.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Page 5: “Specifically, the videos were sampled to have 36 frames at a frequency of 4.” I find this sentence unclear. Does it mean: From the original video, one frame was selected every 4 frames, until a total of 36 frames were collected? Page 5: “For the implementation of Contrastive, we replaced the video backbone model with a 2D image model […]” Why a “2D image model”, the echo data are also a kind of “video” data (2D+t)? Page 6, Table 1: What are the units of the values shown? Furthermore, it would be helpful to mark those methods that are statistically significantly worse than the best method. Page 6: “The output layers for ejection fraction estimation are with 2-fully connected layers with hidden dimension of 512 and 1 respectively.” I find this sentence confusing, because “output layer” usually refers to the last layer of a network, but the sentence states that the output layer has a “hidden dimension”. I can guess what is meant, but please consider rephrasing for clarity. Page 6: Regarding the significance tests: I suggest specifying the type of significance tests as precisely as possible – were paired or unpaired tests used? Page 7: Fig 2 (a): Are the intensity scalings for the different features shown identical or different? Please mention. Page 7: Fig 2 (b) left column of table “Attribution difference in subgroup” seem to contradict the text that mentions “attribution differences between each subgroup” (page 8). Furthermore: “difference” of the group to what other value, the mean of all groups or the mean of all other groups? Please clarify. Page 7: line 5 “whereas u2 are mainly associated to the top-middle and centers” and line 13 “The visual feature u2, which are associated with BMI, are highlighted in the center and top-middle corner” are essentially duplicates.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel and inherently interpretable approach for integrating video and scalar data for EF prediction, supported by strong validation and clear communication of its contributions. Despite several minor issues, the overall quality and significance of the work justify a weak accept.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    I concur with the concerns raised by the other two reviewers. As Reviewer 1 noted, in “comparison to the traditional model [16], the improvement seems marginal”. While the improvements reported in the manuscript are reported to be statistically significant (Wilcoxon test, p < 0.05), statistical significance alone does not imply clinical or practical relevance. For instance, the reported mean absolute error (MAE) for ejection fraction (EF) estimation is 3.95 \pm 0.18 compared to 4.06 \pm 0.14 for the best video-only method. Such a small difference is unlikely to have any meaningful impact on patient care, particularly given that the typical EF in the dataset is 61 \pm 10% [16].

    The manuscript also states that the EchoNet-Pediatric EF dataset [16] “comprises parasternal short-axis (PSAX) view video clips,” which is technically accurate but potentially misleading, as the dataset also includes apical 4-chamber (A4C) views. More critically, the authors report using 4,492 samples after filtering out missing values, whereas [16] only reports 4,424 PSAX samples (see Table 1, p. 485 and Table 3, p. 487). This discrepancy should be clarified.

    In light of the other reviewers’ comments, I revisited [16], which describes a model named “EchoNet-Peds” with reported performance metrics for PSAX-only input: “If presented with only a PSAX video clip, the MAE is 3.80% (3.69%-3.91%), RMSE is 5.14% (4.95%-5.33%), and R2 is 0.74 (0.71-0.77)” [16, p485]. It remains unclear whether this model corresponds to the “R(2+1)D [16]” referenced in the current manuscript, as the term “R(2+1)D” does not appear in [16]. This raises the possibility that [16] already describes a superior video-only method, which may have been retrained in the present study. If so, the observed degradation in performance (from MAE 3.80% to 4.06%) -which is greater than the reported improvement from the proposed method -requires a clear explanation.

    While I acknowledge that I should have identified these issues earlier, this reassessment leads me to recommend rejection of the manuscript.



Review #3

  • Please describe the contribution of the paper

    This paper proposes an explainable, integrative bipartite graph neural network (IBi-GNN) for estimating ejection fraction (EF) in echocardiography. IBi-GNN uses a bipartite graph approach to fuse demographic data with video features extracted by a deep neural network. IBi-GNN showed statistically significant improvement on the EF estimation using echocardiograms in the experiments. Additionally, the identified demography-specific patterns can be leveraged as new clinical knowledge, with potential to make personalized treatments in precision medicine.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper identifies a gap where demographic variables have been overlooked in automated methods to EF fraction using echocardiography. To date, various deep learning methods have been devised for this problem relying on the video itself, but none have taken advantage of demographic information to enhance performance. Essentially, by conditioning the neural network on demographic data, the network can take advantage of differences that may occur due to demographics in a multimodal setting. Therefore the paper is addressing a novel problem – how to integrate demographics and the video echocardiographic data together in common machine learning pipeline.

    The paper achieves this multimodal fusion through a bipartite graph formulation. Separate encoders are developed to embed the echocardiogram and the demographic variables into latent representations. Then, bipartite GNN layers are used to fuse features extracted from the two data types. Specifically, the edges in the bipartite GNN layers, as illustrated in Fig. 2, reflect the interactions between demographic data and echocardiographic information. The connections are one-way, with the demographic features connected to the video features – essentially modulating the video features based on the demographic data. The specific method of data fusion appears novel to this reviewer, although related methods that use bipartite graphs for multimodal data fusion appear in the literature, such as Gao et al., “Predicting the Survival of Cancer Patients With Multimodal Graph Neural Network,” TCBB 2022. Experimental results show the method outperforms related techniques based on the echocardiogram alone as well as other multimodal approaches that use the echocardiogram and the demographic data. Using GNN Explainer, the method also can be interpreted, providing visualization of how the demographic variables relate to the image data.

    The paper suggests source code will be released, which would help the community experiment with the method and extend it in new ways.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    While approach of using a bipartite graph is interesting and demonstrated to work, there are issues related to its design and validation. First, the graph has one-way directed edges from the demographic features to the echocardiography features. It would be helpful if the paper justified this choice in light of alternatives, for example, two sets of one-way directed edges that also update the demographic features. Possibly updating the demographic features has little value as they stem from a small set of input values compared to the much larger high dimensional video input. It would be helpful if the paper could justify the choices made in the design. Relatedly, since the demographic variables are only three (sex, age, bmi), it seems strange to this reviewer to have an encoder taking them to 128 dimensions. Does the demographics encoder/decoder simply memorize the three inputs? How to ensure it doesn’t overfit?

    While Figure 1 is helpful to explain the method, something unclear to this review is H_V^{(l)}. Since the bipartite formulation has one way edges from the demographic features to the video features, it would appear the demographic are not updated in the pipeline, that is H_V^{0} = H_V^{1} = … = H_V^{(l)}. This appears to be reflected in Equation 1. However, the notation in Figure 1 suggests that the demographic features are changing in each layer of the bipartite GNN layers. Perhaps this could be clarified. If the features are not changing, it would be better to label them as H_V^{0} throughout the figure.

    Another weakness is that the paper doesn’t have any ablation studies. As IBi-GNN is the key contribution of the paper, it would be helpful to demonstrate the value of its components through an ablation study showing:

    1. The unimodal performance of each modality – that is, predicting the EF using the exact same architecture as Ibi-GNN but only using the echocardiography data as one experiment, and a second experiment only using the demographic data. The comparisons to [13] and [16] are appreciated, but they have a different architecture.
    2. A direct comparison with cross-attention, again using the exact same encoded features, with the demographic features as query and the video features as key and value. The comparison to [25] is appreciated, but it has a different architecture. Attention layers can be visualized to provide interpretable insights as well.
    3. The impact of the auxiliary loss on the results. This can be achieved by removing the auxiliary loss and measuring the performance.

    In terms of comparisons, it could be helpful to compare to EFNet, which outperforms EchoCoTr on echocardiography video only EF estimation.

    • Ali et al., “EFNet: estimation of left ventricular ejection fraction from cardiac ultrasound videos using deep learning,” PeerJ, Jan 2025. The paper provides code for the method.

    The explainability / interpretability of the results is a nice feature of the proposed approach. However, this could be more clearly explained. The text mentions epicardial adipose tissue linked with BMI, and this tissue appears in the upper right corner. However, in Figure 2 the strong orange arrow is pointing to u_2, not u_3. The latter appears to highlight the upper right part of the image. It would be helpful if the paper had a cardiologist review the results and confirm the findings.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Smaller issues:

    • Please have a consistent approach to capitalization in the title, i.e. please change “Explainable Integrative Bipartite graph convolutional neural network for predicting Ejection fraction in Echocardiography” to “Explainable Integrative Bipartite Graph Convolutional Neural Network for Predicting Ejection Fraction in Echocardiography”.
    • Page 3, please change “$T$ numbers of frames” to “$T$ frames”.
    • Reference 15, please correct inconsistencies with capitalization.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper has identified and interesting problem and has addressed it with suitable methods. However, the hampered by a lack of validation of the effectiveness of the bipartite GNN layers and the model interpretation is somewhat unclear.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    This reviewer initially rated this paper as Weak Accept, recognizing its novel use of a bipartite GNN to fuse demographic data with echocardiographic video for ejection fraction prediction. The integration strategy is clinically motivated and methodologically interesting, addressing a clear gap in multimodal modeling.

    My main concerns were the lack of ablation studies, limited justification for certain architectural choices (e.g., one-way edges, high-dimensional demographic embeddings), and some ambiguity in the model’s interpretability. The rebuttal responded thoughtfully, clarifying the design choices and confirming clinical input into the interpretation analysis.

    Other reviewers echoed support for the paper’s novelty and interpretability. R1 raised concerns about generalizability and modest performance gains, but the authors responded by emphasizing the unique contribution of demographic insight and dataset limitations.

    Overall, the paper is well-motivated, relevant, and makes a solid methodological contribution. This reviewer recommends acceptance.




Author Feedback

We thank all reviewers for assessing our manuscript positively with insightful and constructive comments. Reviewers addressed novelty in methodology (R1, R2, R3), model interpretability (R1, R2), and clinical impact (R1, R3). We are pleased to address all the concerns in order.

R1Q1 [Marginal improvement] Our model outperforms other integrative methods, reducing MSE by 6.95%–28.16%, showing statistically significant (Wilcoxon, p < 0.05). The modest gain over the video-only model may be mainly caused by its backbone performance. Importantly, our approach provides novel insights into the relationship between demographic factors and echocardiographic structure, which unimodal models cannot capture. A small sample size may limit the gains of integrative methods, so larger datasets may further improve performance.

R1Q2 [Generalizability] We agree that validation using additional datasets makes the study stronger. To our knowledge, The dataset, EchoNet-Pediatric, is the only publicly available echocardiogram dataset with demographic features (age, sex, BMI). If larger datasets become available, we will further improve our model’s robustness.

R2Q1 [Precision medicine] This study is the first to explore the relationships between echocardiographic features and demography, which is indispensable to move forward to precision medicine. However, we acknowledge further steps are needed to eventually achieve it. We will clarify this in the final version.

[R2 Minor comments] We apologize for the confusion. For each video, we extracted 36 frames at intervals of 4 frames. We will clarify the points in the final version.

R3Q1 [Justification for model structure]

  • Bipartite graph : Our bipartite graph model is based on the hypothesis that demographic variables are key drivers of echocardiography differences, reflecting clinical reasoning. This design reflects how clinicians interpret echocardiographic features in the context of patient demographics, such as age, sex, and BMI. Our one-way edges allow us to trace this influence.

  • Demographic encoder : A high-dimensional encoder is necessary for graph-based multimodal fusion. The implementation has been done by previous studies using similar demographic embeddings [PMID: 37308585 (Nat. Biomed Eng. 2023), 35540957].

  • Clarity of H_v : Demographic node features H_v are updated at each layer via self-transformation (Eq. 1), even without input from video features. We will clarify this in the final version

R3Q2 [ablation study]

  • Cross-attention model: We already compared ours with an attention model (e.g., transformer) in the cross-validation. While cross-attention captures similarity between modalities, our bipartite graph represents the directional influence of demographics on echocardiogram features.

  • EFNet : Although it may potentially improve performance, it is a backbone model, not the focus of our integrative approach. And it was published on January 21, 2025, so was not available for inclusion.

  • Auxiliary loss : We did not conduct the ablation study for the auxiliary loss, as our main contribution is integration. But we appreciate the suggestion and will consider it in future work.

  • Unimodal Ibi-GNN : our bipartite GNN requires both modalities, as demographic nodes modulate echocardiogram features. Using only one modality would require major architectural changes.

R3Q3 [Explainability] Regions that a graph node is associated with are not necessarily mutually exclusive. We, including a pediatrician as an author, interpreted the model based on pediatric references.

Chair [Clinical applicability]

  • we used the exact same training/validation/test dataset to validate SOTA models and ours for fair comparison.
  • We mainly focused only on the pediatric cohort (0 < age < 18) in this study due to data availability. We will conduct further studies, applying our model to adult datasets once there is an available adult echocardiogram data with demography in the future.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    This paper proposes a demography-dependent approach for EF estimation from echocardiography.

    It is a very interesting approach which appears to lead to better results than SOTA methods. The reviewers have raised several questions about the methods that the paper should address.

    I am concerned about the study’s clinical applicability. The paper suggests these tools are useful for heart failure stratification, but then tests the methods exclusively in pediatric data. Were the SOTA methods compared against trained on adult or pediatric data? If the former, could that explain their poorer performance relative to the proposed method?

    Healthy BMI ranges are heavily age-dependent for the age group considered, so the labels 18<BMI<25 in Fig 2 appear misleading. More importantly, it is likely that the demographic variables considered are not so relevant in the adults with diagnosed or suspected heart failure, who appear to be the true target population for these methods.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The majority of the reviewers have liked the paper and the application and I concur with them.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top