Abstract

Multimodal pathology-genomic analysis is critical for cancer survival prediction. However, existing approaches predominantly integrate formalin-fixed paraffin-embedded (FFPE) slides with genomic data, while neglecting the availability of other preservation slides, such as Fresh Froze (FF) slides. Moreover, as the high-resolution spatial nature of pathology data tends to dominate the cross-modality fusion process, it hinders effective multimodal fusion and leads to modality imbalance challenges between pathology and genomics. These methods also typically require complete data modalities, limiting their clinical applicability with incomplete modalities, such as missing either pathology or genomic data. In this paper, we propose a multimodal survival prediction framework that leverages hypergraph learning to effectively integrate multi-WSI information and cross-modality interactions between pathology slides and genomics data while addressing modality imbalance. In addition, we introduce a memory mechanism that stores previously learned paired pathology-genomic features and dynamically compensates for incomplete modalities. Experiments on five TCGA datasets demonstrate that our model outperforms advanced methods by over 2.3% in C-Index. Under incomplete modality scenarios, our approach surpasses pathology-only (3.3%) and gene-only models (7.9%). Code: https://github.com/MCPathology/M2Surv

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2663_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/MCPathology/M2Surv

Link to the Dataset(s)

N/A

BibTex

@InProceedings{QuMin_MemoryAugmented_MICCAI2025,
        author = { Qu, Mingcheng and Yang, Guang and Di, Donglin and Gao, Yue and Su, Tonghua and Song, Yang and Fan, Lei},
        title = { { Memory-Augmented Incomplete Multimodal Survival Prediction via Cross-Slide and Gene-Attentive Hypergraph Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15969},
        month = {September},
        page = {317 -- 326}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a multimodal survival prediction framework with three key contributions. First, it uses a memory bank that stores paired pathology and genomic features to handle missing data during inference. Second, it builds a multi-slide hypergraph to capture spatial and structural information from both inter- and intra-slide data. Third, it proposes a gene-attentive hypergraph that links gene groups to pathology features using attention, helping balance the influence of high-resolution image data.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed momentum-updated memory bank stores multi-model representations, and retrieves features when input is missing, improving its usability in real-world scenarios.
    2. The gene-attentive hypergraph explicitly focuses on the gene expression, creating dense connections from gene groups to pathology features, and helps address modality imbalance.
    3. The models achieve superior performance in most datasets, in both complete and incomplete modality settings.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. There is somewhat a lack of novelty as the hypergraphs are based on previous design with adaptations.
    2. There is a lack of evidence on the generalizability of the memory bank on different datasets (i.e., what happens if different from the training set).
    3. The information on how to interpret the visualization results is missing, such as the biological interpretation of NR0B1.
    4. The introduction on the study background is rather brief and weak. For example, not sure why there is discussion on FFPE and FF, and FFPE is more common. The overview on existing methods is lacking. The connection between literature review and the SOTA methods chosen for comparison is unclear, with no introduction on the methods chosen for comparison at all. The writing can be much improved. Any discussion on existing graph-based methods?
    5. Any explanation on the suboptimal performance of the proposed M2Surv in CO-READ and HNSC datasets?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the model demonstrated good performance in relatively extensive experiments. The memory bank design enhances the model applicability by handling missing modalities during inference, which is a common issue in clinical settings. The proposed gene-attentive hypergraph enables rich cross-modal interactions by linking gene groups to pathology features, leading to improved multi-modality fusion. Hope that the questions raised above can be adequately addressed and the rating can be changed.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper addresses several limitations, such as incomplete modalities, imbalance modality, in existing multimodal survival prediction approaches. To overcome these limitations, the authors propose a hypergraph-based multimodal survival prediction framework that effectively models the complex relationships across multiple whole-slide images and captures cross-modal interactions between pathology and genomics. Additionally, the introduction of a memory mechanism that stores previously learned paired features and dynamically compensates for missing modalities, enhancing the model’s robustness and clinical utility in scenarios with incomplete data.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The use of hypergraph learning to model both intra-WSI and inter-WSI correlations. Notably, the method leverages complementary information from fresh frozen slides, which are often overlooked in prior studies, enriching the representation space and improving the robustness of multimodal integration.
    2. The authors propose a memory mechanism that stores learned paired pathology-genomic features during training and retrieves semantically similar representations at inference time, mitigating missing modality challenges.
    3. The design of a gene-attentive hypergraph, which employs cross-attention mechanisms to define hyperedges that emphasize cross-modal interactions between pathology features and gene expression profiles.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. While the proposed method introduces a rich hypergraph-based framework, it involves multiple hypergraph constructions and computations, which may result in significant computational overhead. Although the authors report increased training and inference time, this analysis would benefit from additional quantitative metrics, such as GPU memory consumption, FLOPs, and the number of model parameters, compared across baseline methods. Providing these details would offer a more comprehensive understanding of the scalability and feasibility of the method, especially for deployment in resource-constrained clinical settings.
    2. Please justify the inconsistent performance over different datasets in Table 1.
    3. A key claim of the paper is the ability to handle incomplete modalities, yet the proportion of missing data (e.g., missing FF or genomic data) in the current dataset is not clearly reported. Furthermore, it remains unclear how the model performs under varying levels of modality incompleteness. A controlled experiment, e.g., simulating different missing rates and evaluating model robustness, would provide stronger evidence for the effectiveness of the proposed memory mechanism.
    4. The framework relies on constructing inter-slide hypergraphs to capture relationships across multiple whole-slide images. Please clarify how the model functions in scenarios where only a single slide is available, which is common in clinical practice.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite concerns regarding computational efficiency, performance consistency across datasets, and missing data analysis, I believe the core idea is promising, and the contributions are meaningful. Therefore, I recommend a weak accept at this stage, with the expectation that the authors will address the identified weaknesses in future revisions or follow-up work.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Some concerns are addressed, and I am inclined to rate accept of this manuscript, given the promising results and meaningful contributions. The authors are encouraged to update more details as they promised in the rebuttal.



Review #3

  • Please describe the contribution of the paper

    This paper describes an innovative approach to work with multimodal pathology and genomic data for survival prediction. Their approach contains multiple innovations such as memory bank and multi-slide graph constructions. They have also compared their approach with multiple papers in this area.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The hyper graph across multiple images and genes are innovative such that it leverages information across the different sources of data. Their memory bank can also impute the missing data under, which is a common problem in this field.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The writing of this paper is a bit rushed and the following items should be addressed

    1. Equation 3, definition of f, please also do another read of all definitions and formulas.
    2. FF image was the major motivation of this paper but there is no reporting of how many FFPE and FF images are available in each study. This should appear in the results section
    3. The author seems to forget to describe how they set lambda that controls the number of neighbours
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Final score of this paper can be 6 if the 3 problems above can be addressed, which is relatively easy and should not require new experiments.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The methods they proposed are innovative and useful to the field, but the writing and organization should certainly be improved.

    There are also a few future work, ideal if they can address them now but I understand that they can be a bit stretchy:

    1. The image encoder is a ResNet, interesting to see if the results still hold if replaced by a foundation model such as UMI, GigaPath etc
    2. I am a bit confused why the same image encoder can be used for both FFPE and FF. The description of that part can also need some work.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have sufficiently addressed my comments and suggestions




Author Feedback

Common Questions

R#1-Q5 & R#2-Q2: Suboptimal Performance. Our gene-attentive hypergraph is designed to effectively address data imbalance by integrating around genes, as evidenced by its high average C-Index. Additionally, the method’s small standard deviation highlights its stability and indicates a favorable performance interval. However, some datasets might depend more on one modality over another; for example, CO-READ and HNSC datasets often favor pathology over multimodal approaches, leading unimodal pathology methods to outperform multimodal ones in these cases.

R#1-Q1: Hypergraph novelty. A1: Previous hypergraph studies focus on modelling spatial relationship within a single WSI, while we leverage hypergraphs to create a cross-modal links between genomic and pathology data, beyond that, we further use it to construct graph representations among different WSIs (such as FFPE and FF). Both cross-modal and cross-slides are newly explored.

R#1-Q2: Memory bank generalizability. A2: Our ablation study demonstrates the adaptability of our memory bank across different frameworks and datasets. Notably, regarding dataset generalizability, the CO-READ dataset comprises two types of cancer: COAD and READ. In some splits, the two types of cancer are separated into the train and valid sets, our model still achieves good performance.

R#1-Q3: Visualization interpretation. We use gradients from the prediction layer to highlight gene influence (Fig. 3). For instance, NR0B1 shows a positive gradient for BRCA, indicating its crucial role in breast cancer formation due to its link with endocrine disorders and sexual development, supporting our model’s interpretative capability.

R#1-Q4: Insufficient background. Pathology graph-based methods typically use patches as nodes for adjacency graphs. Recent studies employ cellular graphs to explore spatial relationships among biomarkers or cell types within WSIs. In multimodal survival analysis, techniques like cross-attention or optimal transport are employed for modality fusion. These methods solely rely on FFPE slides while FF slides are rarely used due to their artifacts. However, from a clinical timeliness and multimodal perspective, FF slides offer significant value that is overlooked by these methods.

R#2-Q1: Resource consumption. A1: Our model utilizes 3.33 GB memory on an 4090 GPU, with 3.01 MB size. This will be added into the Sec. 3.

R#2-Q3: Incomplete modality handling. A3: In Table 1, we simulated scenarios where modalities were totally missing and our models showed the best mean results. We will provide more results across different modality loss ratios in Sec. 3.

R#2-Q4: Single Slide Scenario. Our inter-slide hyperedges are built on patch features from all slides, so it can be adapted to the single slide. When only a single slide is available, we directly identify patches with high similarity within that slide.

R3-Q1: Formula definition. A1: In Eq. 3, ‘f’ refers to the feature of a patch. We have now carefully rechecked all definitions and formulas.

R3-Q2: Slide available. A3: For each datasets, every patient has at least one FF slide and one FFPE slide with total 3128 FFPE and 2359 FF (see TCGA data portal). We will provide data directories for all slide information.

R3-Q3: Neighbour control. A3: The lambda = 9 indicates, for each patch, its immediate neighbors within a single layer. In our ablation study, we also explored lambda = 5 (four direct neighbors) and 25 (two layers of surrounding neighbors) respectively.

R3-Q4: Future work. A4: We have already tested encoders like UNI, CTransPath, and Conch, as well as different image encoders for FF and FFPE slides. While CTransPath and Conch showed improvements, the overall impact was minimal. We will include these results in our code repository.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top