Abstract

Disease prediction based on multimodal data is a critical yet challenging task in healthcare, especially in intensive care units (ICUs) where patients present complex clinical trajectories with multiple admissions and comorbidities. Current multimodal learning approaches lack effective modeling of cross-modal complementary information, which leads to suboptimal feature interactions. Besides, traditional methods that incorporate external knowledge graphs (KGs) often introduce noise and computational complexity, due to the use of all one-hop neighbors within the KGs. To address these challenges, we propose Knowledge-Enhanced Complementary Information Fusion with temporal heterogeneous graph learning (KCIF) for patient modeling. KCIF introduces a temporal heterogeneous admission graph (THAG) that integrates KGs to capture semantic and temporal dependencies across admissions. It further employs a complementary information fusion mechanism to leverage mutual enhancement between lab tests and medical events. Extensive experiments on the MIMIC-III/IV benchmarks demonstrate that KCIF consistently outperforms baselines, achieving improvements of over 2.5\%–6.0\% in $w$-$F_1$ score and 1.7\%–4.5\% in $R@20$ across multiple ICU disease prediction. The code is available at \url{https://github.com/Boaz-SCUT/KCIF}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3827_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/Boaz-SCUT/KCIF

Link to the Dataset(s)

N/A

BibTex

@InProceedings{YanZon_KnowledgeEnhanced_MICCAI2025,
        author = { Yang, Zongbao and He, Shihuan and Chen, Zhichen and Zhang, Hao and Wang, Ruxin},
        title = { { Knowledge-Enhanced Complementary Information Fusion with Temporal Heterogeneous Graph Learning for Disease Prediction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15971},
        month = {September},
        page = {427 -- 437}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    (1) Construction of a Temporal Heterogeneous Admission Graph (THAG) that incorporates external knowledge graphs to capture both semantic and temporal dependencies between patient admission histories.

    (2) Development of a complementary information fusion mechanism that enables mutual enhancement between lab tests and medical events for more comprehensive patient representation.

    (3) Experimental validation on MIMIC-III/IV datasets demonstrating performance improvements over state-of-the-art methods across multiple disease prediction tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) Novel modeling approach: The paper introduces an innovative temporal heterogeneous graph structure (THAG) that effectively integrates both semantic knowledge from external knowledge graphs and temporal dependencies between admissions. This addresses the limitation of previous methods that focus only on static semantic connections.

    (2) Complementary information extraction: The proposed complementary information fusion mechanism with orthogonality constraints is a thoughtful approach to extracting information that cannot be derived from any single modality. This enables the model to leverage the mutual enhancement between lab tests and medical events.

    (3) Comprehensive evaluation: The authors conduct extensive experiments on multiple tasks (multi-disease prediction, cardiovascular disease prediction, and binary classification tasks) across two datasets (MIMIC-III/IV), demonstrating consistent performance improvements over strong baselines.

    (4) Clinical relevance: The approach addresses a significant clinical challenge in ICU settings, where patients often have multiple comorbidities and complex clinical trajectories that require multimodal data analysis for accurate prediction.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    (1) Limited explanation of knowledge graph construction: While the paper mentions incorporating external knowledge graphs, it lacks detailed explanation about how these knowledge graphs are constructed or which specific medical knowledge graphs are used. This information is crucial for reproducibility.

    (2) Limited discussion on computational complexity: The paper does not thoroughly discuss the computational complexity of the proposed model, especially considering that graph-based models with attention mechanisms can be computationally expensive. This is important for practical implementation in clinical settings.

    (3) Limited comparison with recent multimodal approaches: While the authors compare with several baselines, comparison with more recent state-of-the-art multimodal approaches for healthcare (e.g., some papers from 2023-2024) would strengthen the evaluation.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    (1) Model complexity: The paper introduces multiple components (THAG, complementary information fusion, etc.). It would be helpful to include a discussion on the trade-off between model complexity and performance gain.

    (2) Hyperparameter sensitivity: The paper could benefit from a sensitivity analysis of important hyperparameters (e.g., λ1, λ2 in the loss function) to demonstrate the robustness of the approach.

    (3) Real-world deployment considerations: A brief discussion on the challenges and considerations for deploying such a system in real clinical settings would enhance the practical impact of the work.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed KCIF framework addresses important limitations in existing approaches by effectively modeling both semantic and temporal dependencies in patient data through the novel THAG structure. The complementary information fusion mechanism provides a principled way to leverage multimodal data, which is crucial in complex clinical settings like ICUs.

    However, there are still some limitations, particularly around reproducibility details and statistical validation, these could be addressed in the camera-ready version.

    If the authors can solve my doubts during rebuttal period, especially how to deal with reproducibility, I will consider modifying the score.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes KCIF, a framework that constructs a Temporal Heterogeneous Admission Graph (THAG) and introduces a cross-modal complementary information fusion mechanism for disease prediction tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. KCIF employs multi-head attention and soft orthogonality constraints to capture deep interactions between lab tests and medical events, effectively overcoming limitations of earlier fusion approaches that suffered from redundancy or weak alignment.

    2. By incorporating external KG relations and admission time intervals, the proposed model generates a better representation of patient history, outperforming static-graph-based approaches like KGxDP.

    3. The experimental results support the effectiveness of the proposed method.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. While the system design is well-integrated, most key components (Transformer, HGAT, contrastive learning) are proposed by prior works, thus the novelty of the method is limited.

    2. This paper lacks analysis on model robustness across different hospital settings or its transferability, which are crucial for real-world clinical adoption.

    3. Despite the design of orthogonality constraints and attention-based interaction, the paper does not provide clear interpretability or visualization of the extracted complementary features, which weakens the medical relevance of the fusion logic.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In this paper, the use of a temporal heterogeneous admission graph (THAG) and cross-modal constraints demonstrates some insightful design choices, and the method achieves consistent performance improvements across two public benchmarks (MIMIC-III/IV). The writing is clear, and the experimental analysis is thorough, including ablation studies that support each component’s contribution. However, the paper’s novelty is somewhat limited, as many core components (e.g., Transformer, HGAT, contrastive learning) are adapted from prior work, and the interpretability of the proposed fusion mechanism remains insufficiently explored. Additionally, there is a lack of discussion on the model’s generalizability and clinical deployment potential. Despite these weaknesses, the empirical results are strong, and the overall framework is well-structured and valuable to the community. Therefore, I suggest a weak accept.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    This work introduces Knowledge-Enhanced Complementary Information Fusion (KCIF), integrating heterogeneous temporal graph learning for disease prediction. Key innovations postulated include:

    1. a time-enhanced Transformer and temporal heterogeneous admission graph (THAG) modeling semantic-temporal dependencies;
    2. hierarchical Transformer for multi-scale lab test analysis;
    3. cross-modal complementary learning between events and lab data, validated on MIMIC-III/IV.
  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    I will be commenting on the methodology: This methodology presents a major strength through its novel integration of temporal dynamics and semantic relationships in medical event modeling. The key innovation lies in the time-aware attention (T-Attention) mechanism, which explicitly encodes temporal intervals between admissions (Δ𝑡) while learning admission-level representations. This goes beyond standard Transformers by jointly weighting medical event embeddings (Fₘₑ) and time-dependent features (Fₜ), capturing both clinical and progression-based patterns.

    The time-enhanced Transformer (T-Transformer) further refines this by incorporating long-term admission dependencies via temporal positional encoding (Fₜ′), ensuring sequential continuity. The inclusion of static demographic data (Hₛ) via multi-hot encoding and concatenation with dynamic event embeddings (Hₘₑ) enables a holistic patient representation (Hₚ).

    This approach is novel because it unifies temporal granularity (inter- and intra-admission timing) with semantic medical event modeling, addressing a critical gap in longitudinal EHR analysis. The explicit time-conditioned attention is particularly interesting for progressive disease prediction, where temporal irregularities (e.g., missed visits) carry clinical meaning. This framework could significantly improve early risk stratification in chronic conditions.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    From the introduction standpoint, a major weakness is the lack of concrete discussion about data heterogeneity and missing data challenges in ICU settings. While the text emphasizes multimodal fusion, it does not address how KCIF handles real-world ICU data issues like irregular sampling, variable recording frequencies, or missing modalities across patients. This omission is problematic because ICU data is notoriously sparse and inconsistent, and failing to explicitly account for these factors could lead to biased or unreliable model performance in clinical practice fro experience. Without specific mechanisms for data imputation or robustness to missingness, the claimed superiority of KCIF may not hold in practical deployments.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (6) Strong Accept — must be accepted due to excellence

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It provides good science that surely would improve the experience of patients in the ICU when implemented.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank the reviewers for their valuable feedback. Our point-by-point responses are as follows:

Reviewer #1 (7-1): Novelty limited: Most components are proposed by prior works. Response: While building upon established components, KCIF introduces an integrated framework tailored for the ICU setting: 1) We propose, for the first time, the Temporal Heterogeneous Admission Graph (THAG), which collaboratively integrates a medical knowledge graph with temporal diagnosis data to capture both semantic and temporal patterns in a patient’s admission history. 2) T-Transformer explicitly encodes temporal gaps into positional embedding to address irregular time intervals between admissions. 3) NET-HGAT is used to processes multiple entity types and their relations within the THAG. Based on them, a designed orthogonality constraint is applied to extract complementary information across multiple modalities.

Reviewer #1 (7-2) & Reviewer #3 (7): Model robustness: Lacks analysis on robustness on different hospital settings or data heterogeneity. Response: We acknowledge these critical concerns. Following prior works [3,11,12,24], we selected MIMIC-III/IV as they are widely used benchmarks ensuring fair comparisons. We aim to collaborate with medical institutions to obtain multi-center data, thereby enhancing KCIF’s applicability to domain adaptation and missing data scenarios in the future.

Reviewer #1 (7-3) Interpretability: “Didn’t provide clear interpretability or visualization”. Response: Due to space limitations, we focus mainly on demonstrating model performance across diverse datasets and tasks. We verify the effectiveness of each component through ablation studies (Table 3). We appreciate this suggestion and will incorporate case studies with visualizations to improve interpretability.

Reviewer #2 (7-1) Limited explanation of the used Knowledge Graph (KG). Response: We utilize SNOMED CT, an authoritative, widely-adopted open-access medical KG containing over 300,000 concepts and 1.37 million relationships stored as semantic triplets. The construction of the proposed THAG is detailed in Sec. 2.1: (a) Mapping ICD-9 codes to SNOMED CT concepts; (b) Constructing each patient’s THAG with three relation types: E_time, E_has, and E_rel (semantic relationships from SNOMED CT, e.g., “is-a”, “treat”, “cause”). The code will be open-sourced upon acceptance.

Reviewer #2 (7-2, 10-1, 10-3) & Reviewer #3 (7): Model Complexity and Real-world adoption. Response: KCIF also maintains lightweight and acceptable parameter size (~10M), comparable to baselines like HITA (~3.3M) and KGxDP (~4.1M), yet achieves substantial performance gains (1.06%~13.24%) over KGxDP across multiple tasks. These improvements can enhance the quality of clinical decision-making in ICU settings.

Reviewer #2 (7-3) & Reviewer #2 (10-2): Experimental Validation:”Limited comparison with recent multimodal methods” and “Hyperparameter sensitivity”. Response: (1) In exiting disease prediction tasks on MIMIC datasets, most traditional methods [3,11,12,24] rely on diagnostic records only, because they are structured and directly related to prediction targets. Although some multimodal methods exist for specific tasks, direct comparison is inappropriate: (a) They treat admission independently, rather than modeling the full sequence of a patient’s admissions - fundamentally differing from our focus on disease evolution over time; (b) They use task-specific modalities, while our framework provides a more general solution using consistent multimodal data across all tasks (multi-class, specialty or specific disease). We appreciate this suggestion and plan to adapt recent multimodal methods to our sequential admission setting in future work. (2) In our earlier work, we have conducted grid search for loss function weights λ1, λ2 from [0.1, 0.01, 0.005, 0.001], finding optimal performance at λ1=0.01, λ2=0.05. Space limitations prevented including these details in the original paper.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top