Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Survival prediction plays a crucial role in clinical decision-making, enabling personalized treatments by integrating multi-modal medical data, such as histopathology images, pathology reports, and genomic profiles. However, the heterogeneity across these modalities and the high dimensionality of Whole Slide Images (WSI) make it challenging to capture survival-relevant features and model their interactions. Existing methods, typically focused on single-modal WSI, fail to leverage multimodal information, such as expert-driven pathology reports, and struggle with the computational complexity of WSI. To address these issues, we propose a novel Tri-Modal Survival Estimation framework (TMSE), which includes three components: (1) Pathology report processing pipeline, curated with expert knowledge, with both the pipeline and the processed structured report being publicly available; (2) Context-aware Tissue Prototype (CTP) module, which uses Mamba and Gaussian mixture models to extract compact, survival-relevant features from WSI, reducing redundancy while preserving histological details; (3) Attention-Entropy Interaction (AEI) module, a attention mechanism enhanced with entropy-based optimization to align and fuse three modalities: WSI, pathology reports, and genomic data. Extensive evaluation on three TCGA datasets (BLCA, BRCA, LUAD) shows that our approach achieves superior performance in survival prediction. Data and code are available: https://github.com/RuofanZhang8/TMSE

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2424_paper.pdf

SharedIt Link: https://rdcu.be/eHwLX

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04927-8_61

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/RuofanZhang8/TMSE

Link to the Dataset(s)

https://github.com/RuofanZhang8/TMSE

BibTex

@InProceedings{ZhaRuo_TMSE_MICCAI2025,
        author = { Zhang, Ruofan AND Fang, Mengjie AND Liu, Shengyuan AND Wang, Zipei AND Tian, Jie AND Dong, Di},
        title = { { TMSE: Tri-Modal Survival Estimation with Context-aware Tissue Prototype and Attention-Entropy Interaction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {640 -- 650}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper introduces a tri-modal survival estimation framework that integrates three distinct modules: (1) a report processing module to extract textual features, (2) a context-aware prototype generation module to effectively represent features for each whole-slide image (WSI), and (3) an entropy-based attention mechanism to align features across the three modalities.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This approach allows for the integration of textual reports, whole-slide images (WSIs), and genetic information into a unified survival estimation framework, facilitating a more comprehensive analysis of patient data and potentially enhancing predictive performance.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The paper lacks a comparative analysis of visualization and interpretability, which are crucial for understanding the model’s decisions; for instance, t-SNE plots could be compared with other state-of-the-art methods. The method should be further explained; for example, the L_I loss function requires more detailed information regarding its formulation and role within the model.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Could the authors enhance the clarity of the AEI module’s description, particularly by improving the alignment between the textual explanation and the corresponding figure? Based on the figure, is it that the Entropy-based Information Retention module compute mutual information between pre- and post-AEI features to derive a cross-modal representation, and then image features are excluded from the final AEI output?
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method is interesting and the results are convincing. However, the method description is hard to understand.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

This paper proposes TMSE, a tri-modal survival estimation framework that integrates whole slide images (WSI), pathology reports, and genomic profiles for cancer prognosis prediction. The model is evaluated on three TCGA datasets (BRCA, BLCA, LUAD) and achieves state-of-the-art performance in survival prediction (C-index), outperforming several unimodal and multimodal baselines.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The proposed tri-modal survival prediction framework is of importance in clinical practice.
2. The authors also propose a prototype-based learning to extract the most survival-revalent features from WSI, as well as an entropy-guided attention mechanism to align and fuse modalities.
3. The proposed method consistently outperforms baselines on three datasets.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The prototype-based methods [1, 2] has been widely explored in WSI analysis, e.g., cross-attention has been explored to find the interaction between prototypes and instance features [1]. However, this family of methods have neither been discussed nor compared, which makes the positioning of the proposed prototype-based learning method unclear. Let it alone the GMM-based method is adapted from PANTHER [21].
2. The ablation of different number of prototypes are missing. Instead, the authors fixed it with 16, which is somewhat questionable. This is because different datasets may exhibit different modes (see e.g. [1]). The ablations on the weight balance parameters \alpha_1 and \alpha_2 also appear to be missing.
[1] DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification. ECCV [2] Rymarczyk et al., ProtoMIL: Multiple Instance Learning with Prototypical Parts for Whole-Slide Image Classification. MICCAI
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents an interesting and clinically meaningful tri-modal survival prediction framework, which demonstrates meaningful empirical results compared to uni-modal approaches. Though there are some issues regarding the prototype-based learning and ablations. Therefore, my initial evaluation is “weak accept”.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This paper enhances multi-modal survival analysis by an expert-knowledge-based prompt chain pipeline to organize approximately, and the CTP module and AEI for addressing WSI redundancy and multi-modality fusion, respectively. Extensive experiments on three TCGA datasets validate the effectiveness of their methods.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The integration of clinic report handling is valuable, and the introduction of prototype-based information redundancy processing, along with a new modality fusion method, adds further significance.
- The motivation for the proposed method is well-reasoned, and the authors have explained their method well.
- The paper conducted both qualitative and quantitative experiments on TCGA datasets and achieved SOTA performance.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Three public datasets may not be sufficient to fully validate the model’s effectiveness; it might be beneficial to consider adding more datasets.
- The weight factors in the loss function need to be experimented with.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper’s motivation is clear, and the methodology is innovative, with improvements in the use of modalities, WSI feature aggregation, and modality fusion.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We would like to sincerely thank the area chair and all reviewers for their thoughtful evaluation of our work. We greatly appreciate the insightful comments and constructive suggestions provided, which have been invaluable in helping us improve the quality of our paper. We will carefully incorporate the feedback in our camera-ready version to ensure it addresses the points raised and reflects the reviewers’ helpful comments.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

TMSE: Tri-Modal Survival Estimation with Context-aware Tissue Prototype and Attention-Entropy Interaction

Author(s):