Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Cancer survival prediction requires integrating pathological Whole Slide Images (WSIs) and genomic profiles, a challenging task due to the inherent heterogeneity and the complexity of modeling both inter- and intra-modal interactions. Current methods often employ straightforward fusion strategies for multimodal feature integration, failing to comprehensively capture modality-specific and modality-shared interactions, resulting in a limited understanding of multimodal correlations and suboptimal predictive performance. To mitigate these limitations, this paper presents a Multimodal Representation Decoupling Network (MurreNet) to advance cancer survival analysis. Specifically, we first propose a Multimodal Representation Decomposition (MRD) module to explicitly decompose paired input data into modality-specific and modality-common representations, thereby reducing redundancy between modalities. Furthermore, the disentangled representations are further refined then updated through a novel training regularization strategy that imposes constraints on distributional similarity, difference, and representativeness of modality features. Finally, the augmented multimodal features are integrated into a joint representation via proposed Deep Holistic Orthogonal Fusion (DHOF) strategy. Extensive experiments conducted on six TCGA cancer cohorts demonstrate that our MurreNet achieves state-of-the-art (SOTA) performance in survival prediction.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0057_paper.pdf

SharedIt Link: https://rdcu.be/eG4Dx

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05182-0_39

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

TCGA pathological dataset: https://portal.gdc.cancer.gov/ Paired Genomic dataset: https://www.cbioportal.org/

BibTex

@InProceedings{LiuMin_MurreNet_MICCAI2025,
        author = { Liu, Mingxin AND Cai, Chengfei AND Li, Jun AND Xu, Pengbo AND Li, Jinze AND Ma, Jiquan AND Xu, Jun},
        title = { { MurreNet: Modeling Holistic Multimodal Interactions Between Histopathology and Genomic Profiles for Survival Prediction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {396 -- 406}
}

Reviews

Review #1

Please describe the contribution of the paper
1. Propose a multi-modal representation decoupling network for survival prediction of human cancers.
2. Evaluate the proposed method on six cancer cohorts derived from TCGA.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper is well-written and easy to read.
2. Consider the shared and specific information in the multi-modal data.
3. The experimental results are tested on six datasets on TCGA.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The motivation of this paper is not clear. The multi-modal or multi-view learning has been widely investigated in the machine learning community, and many studies consider the shared and specific information among multi-modal data. The authors should give more evidences about why these existing studies cannot effectively decompose and reintegrate the multimodal representations (stated in the second paragraph of page 2) ？
2. As state in comment 1, the author only compare the proposed method with existing multi-modal survial prediction models. More comparisons with the SOTA multi-modal learning algorithms in machine learning community should be provided.
3. It is good to see the authors have used the CHIEF encoder to extract the patch-level representations? However, it is not clear if such feature extraction way is superior to other image encoders of pathology-language models i.e., CONCH, PLIP.
4. In figure 4, the authors shoud compare the stratification performance with SOTA algorithms..
5. In Table 1, the author should provide the confidence interval for the comparison of different methods. Otherwise, saying our method is better is meaningless.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Please see the major strength and weakness parts.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The main contribution of the paper is the development of the MRD module, which separates modalities into both common and specific information, and then the DHOF module to fuse this information. To further strengthen learning constraints, the paper integrates various loss combinations addressing modality similarity, difference, reconstruction, and survival prediction tasks. Experiments conducted on six TCGA cancer cohorts demonstrate the proposed method.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Introduction of 4 different losses to fuse histology and genomics. Ablations show that the four losses add meaningful information to increase performance
- Results on 6 TCGA cohorts supported by various baselines
- Writing and figures convey the messages of the paper clearly
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- While the method is novel, the study design is largely the same as any other survival prediction study in the field: method, 5-7 TCGA cohorts, c-index as the metric, and Kaplan Meier curves. I think this study can go beyond this mould of survival prediction studies, especially on the interpretability front. The authors claim that their method captures common and orthogonal information from modalities, but do not comment on what this information is or looks like. Can the authors expand on interpretability, specifically talking about the different types of information/ insights gained by their method.
- Why did the authors use CHIEF as the patch encoder? The field largely knows that CHIEF uses CtransPath image encoder which has been outperformed by many superior models like Virchow and UNI. This has been shown by numerous benchmarks, such as HEST-1K. Can the authors show that their method is agnostic to patch encoder.
- One of the most key limitations of this study is that it trains and tests the method on TCGA cohorts. The authors do not mention if they use site-stratified splits. Hence, it is unclear to me how generalizable is the method. I urge the authors to perform independent validation using unseen cohorts like CPTAC.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- Can the authors use their method to quantitatively show how much do different modalities contribute to the survival prediction, and what information due to contribute?
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall a novel method that pushes the field ahead, but I have raised some important questions about the generalizability and study design (specifically interpretability). I am happy to raise my score if these limitations are overcome.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

In this paper, the authors proposed a multimodal representation decoupling-based framework (MurreNet) for survival analysis on diverse cancer types by integrating whole slide pathology images with genomic features. The proposed model includes a Multimodal Representation Decomposition (MRD) module to decompose the multimodal representations into modality-specific and modality-common knowledge and a novel multimodal fusion method for aligning and fusing these two crucial representations.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) The paper is well-written and easy to follow, the figures are nice and informative. Comprehensive literature review and sufficient motivation for the proposed method is provided. (2) The authors perform extensive benchmark experiments on multiple cancer datasets, the results validates its superiority for survival prediction in comparison with a excellent choice of state-of-the-art methods. Detailed ablation studies are well-conducted for demonstrating the effect of each module and strategy design in the proposed model. (3) The design of multimodal representation decomposition is interesting, such a design is transferable to multiple tasks in multimodal learning.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

(1) The related work and references can be improved better. For example, latest multimodal survival analysis methods using information/representation decomposition have not been contained and analyzed in the review analysis [1,2,3]. I perfectly understand that 8-pages are very short to provide quite comprehensive related work description, so I hope these similar works can be considered for review analysis in the final version. (2) The formation and explanation is not sufficient in 2.2 Multimodal Representation Decoupling part, the authors should provide a more detailed description of MRD module in their final revision. (3) The description for high- and low-risk groups definition is not clear, the authors should provide the details about how they determined the high-risk and low-risk groups.

[1] Zhou H, Zhou F, Chen H. Cohort-individual cooperative learning for multimodal cancer survival analysis[J]. IEEE Transactions on Medical Imaging, 2024. [2] Zhang Y, Xu Y, Chen J, et al. Prototypical information bottlenecking and disentangling for multimodal cancer survival prediction[J]. arXiv preprint arXiv:2401.01646, 2024. [3] Long L, Cui J, Zeng P, et al. MuGI: Multi-Granularity Interactions of Heterogeneous Biomedical Data for Survival Prediction[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024: 490-500.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(6) Strong Accept — must be accepted due to excellence
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The quality and clarity of the work are excellent, each module in the proposed method is very reasonable and has been validated helpful for improving the multimodal data fusion and survival predictions via extensive experiments on public datasets from different human cancers.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

My co-authors and I would like to thank all reviewers for their valuable comments and suggestions. We are pleased that the reviewers have recognized our paper’s clarity (R#1,2,3), novel and interesting ideas (R#1,3) and sufficient experiments (R#1,2,3). Below, we provide our responses to the comments (R: Reviewer, C: Comment).

Motivation: (R1-C1) We agree that the shared/specific information decomposition in multimodal learning has been widely studied. However, existing similar methods (as we stated in Para2 on Page2) typically suffer from some limitations like simply decomposes the features into shared and specific parts then simply fusing, lacking further optimization or dedicated learning strategies (Ref.31), etc. We will revise this part to clarify these distinctions and provide a clearer justification of our model.

Related work: (R3-C1) We appreciate your suggestion for review analysis, we will include the references you mentioned in Introduction in the final version.

Patch encoder: (R1-C3) Recent studies have demonstrated that the vision-language models (CONCH, PLIP) achieve better performance than vision-only models (CHIEF, UNI) like R2 mentioned. (R2-C2) However, the more complex architecture and large-scale multimodal training data of CONCH will lead to increased computation cost and longer feature extraction time, we chose CHIEF as patch extractor for its strong balance between performance and computation efficiency. Importantly, our model is patch encoder-agnostic, as our model operates only on extracted patch-level features. To ensure fair comparison, all experiments use the same patch encoder (CHIEF) and identical settings, thereby eliminating any performance bias introduced by differing patch encoders.

Baselines and experiments: We appreciate the valuable experimental suggestions. (R1-C2,R2-C3) In this work, we follow the experimental protocols of SOTA survival prediction works to perform experiments on six TCGA datasets with 15 cutting-edge algorithms (seven multimodal SOTAs), reported the mean and standard deviation for each model to demonstrate the superior performance and generalization of our model. (R1-C5) We appreciate R1’s suggestion. However, in current survival analysis studies, reporting 5-fold cross-validated mean and standard deviation is a widely adopted and comparable evaluation setting (e.g., Refs.5,9,15,18,21,24,25,26,29,30,31). To ensure consistency with prior works and enable fair comparisons, we followed this standard practice. We believe this reporting format sufficiently demonstrates the effectiveness and robustness of our model. We agree that additional multimodal baselines (R1-C2), comparative experiments (R1-C5,R2-C3) and interpretability analyses (R2-C1) are important for enhancing our work. (R2-C3) For demonstrating the generalizable ability of our model using external and independent validation, we are collaborating with hospitals to collect the pathology-genomic paired data to serve as the external cohort, we will further solve this problem in the future. Unfortunately, additional experimental results and substantial changes on experiments, data and analysis are strictly not allowed in the rebuttal and final paper by MICCAI official guidelines, so we are sorry that we are prohibited from providing additional experiments and further analyses currently. We appreciate your kind understanding, your suggestions have greatly improved our work!

Methodological and detailed clarity: We thank R3 for pointing out the need for clearer descriptions on methodology and training settings. (R3-C2) For MRD module, we will update the description in section2.2 to provide clearly explain and sufficient detail. (R3-C3) For the definition of high- and low-risk groups, the patients were stratified into high- and low-risk groups by the median score output via predictive models. We will clarify this in the final paper. Again, we sincerely thank you for your constructive suggestions!

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

MurreNet: Modeling Holistic Multimodal Interactions Between Histopathology and Genomic Profiles for Survival Prediction

Author(s):