Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Survival prediction aims to evaluate the risk level of cancer patients. Existing methods primarily rely on pathology and genomics data, either individually or in combination. From the perspective of cancer pathogenesis, epigenetic changes, such as methylation data, could also be crucial for this task. Furthermore, no previous endeavors have utilized textual descriptions to guide the prediction. To this end, we are the first to explore the use of four modalities, including three clinical modalities and language, for conducting survival prediction. In detail, we are motivated by the Chain-of-Thought (CoT) to propose the Chain-of-Cancer (CoC) framework, focusing on intra-learning and inter-learning. We encode the clinical data as the raw features, which remain domain-specific knowledge for intra-learning. In terms of inter-learning, we use language to prompt the raw features and introduce an Autoregressive Mutual Traction module for synergistic representation. This tailored framework facilitates joint learning among multiple modalities. Our approach is evaluated across five public cancer datasets, and extensive experiments validate the effectiveness of our methods and proposed designs, leading to producing state-of-the-art results. Codes will be released.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0663_paper.pdf

SharedIt Link: https://rdcu.be/eG4C2

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05182-0_9

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/haipengzhou856/CoC

Link to the Dataset(s)

TCGA dataset: https://portal.gdc.cancer.gov/

BibTex

@InProceedings{ZhoHai_CoC_MICCAI2025,
        author = { Zhou, Haipeng AND Yang, Sicheng AND Yang, Sihan AND Qin, Jing AND Chen, Lei AND Zhu, Lei},
        title = { { CoC: Chain-of-Cancer based on Cross-Modal Autoregressive Traction for Survival Prediction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {85 -- 94}
}

Reviews

Review #1

Please describe the contribution of the paper

This authors presents the Chain-of-Cancer (CoC) framework, a novel approach for survival prediction in cancer patients that integrates multiple modalities, including clinical data and language descriptions. The study emphasizes the importance of epigenetic changes and introduces a cross-modal autoregressive model that leverages descriptive text to enhance feature representation. By employing intra-learning and inter-learning strategies, the CoC framework aims to improve the predictive capabilities of survival analysis. The methodology is validated through extensive experiments across five public cancer datasets, demonstrating superior performance compared to existing methods.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The Chain-of-Cancer framework is the first to utilize four modalities, including clinical data and language, for cancer survival prediction.
- The proposed method effectively integrates epigenetic changes and textual descriptions to enhance prediction accuracy.
- The framework demonstrates superior performance compared to existing state-of-the-art methods across multiple cancer datasets.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The paper mentions the CoC-Adapter but lacks sufficient detail on its operation within the framework. A comprehensive explanation would clarify its role and significance
- While the results are presented, the implications for clinical practice and future research are not thoroughly discussed. Expanding on these implications would provide a more comprehensive understanding of the work’s significance. Furthermore, enhancing the analysis of existing methods would highlight gaps in the literature that the CoC framework addresses, strengthening the paper’s positioning within the field.
- The current paper mentions the use of textual descriptions but lacks specific examples and justification for their necessity. Providing this would enhance the argument for the multimodal approach. Additionally, thoroughly addressing potential limitations or counterarguments regarding the contributions of different modalities would demonstrate a balanced perspective and strengthen the overall argument.
- There is a typo in Figure 2: “casual”. Overall writing of the paper can be improved to make it easy to follow and understand for readers.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper is suitable for publication with revisions, as it presents a novel approach that could influence cancer prognosis research and multimodal learning. Addressing the suggested revisions will strengthen its contribution to the field.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The authors propose a novel framework, Chain-of-Cancer, for survival prediction based on RNA expression, DNA methylation, and whole slide histopathology images (WSI). The method begins by extracting embeddings from each modality and proceeds in two phases: intra-learning and inter-learning. In the intra-learning phase, modality-specific embeddings are projected and passed through a multi-layer perceptron (MLP). The inter-learning phase introduces cross-modal interactions by leveraging text adapters to embed features across modalities, followed by an autoregressive mutual traction mechanism designed to predict the next token. This phase also incorporates a mutual information minimization objective to encourage disentangled representations across modalities. Finally, embeddings from both the intra- and inter-learning phases are concatenated and passed through a linear layer to predict patient survival outcomes.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The integration of transcriptomic, pathological (WSI), and epigenetic (DNA methylation) data provides a more comprehensive representation of the underlying biological processes. This multimodal design has the potential to enhance predictive performance and opens opportunities for future methodological improvements in survival analysis.
- The method introduces a novel concept through the Chain-of-Cancer (CoC) adapters, designed to emulate a pathologist’s reasoning by incorporating contextual cross-modal information into the learned embeddings.
- The architectural decomposition into intra-learning and inter-learning phases is novel.
- Extensive experiments demonstrate that the proposed method consistently outperforms models trained on individual modalities (RNA, WSI) and even strong multimodal baselines (e.g., WSI+RNA), underscoring the robustness and added value of incorporating DNA methylation and the CoC framework.
- The ablation studies are particularly informative, clearly illustrating the individual contributions of each architectural component.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- In my view, the main limitation of the paper lies in the lack of clarity regarding the core contribution the authors aim to emphasize. A large number of techniques are introduced, and although the ablation study attempts to disentangle their individual effects, many components still lack clear justification. For instance, it is not obvious why a global $f_path$ module was introduced, or why an auxiliary reconstruction task is applied at the transformer’s output. Similarly, for the motivation behind using mutual information minimization. While the authors provide some textual justification, a deeper investigation—supported by targeted experiments—would be necessary to demonstrate why these mechanisms help (or prevent specific issues), beyond simply showing improved survival prediction. Additionally, it is unclear whether hyperparameters like the weighting factor $\lambda$ were optimized within the same cross-validation loop. The introduction of multiple losses, training objectives, and hyperparameters could affect the reliability of cross-validation results, especially in the absence of an independent test set.
- The engineering and implementation details remain ambiguous. For example, RNA features are grouped using $N_g=6$, but it is unclear whether these correspond to known biological pathways as in MCAT [1] (guess because of the number 6). The same question applies to DNA methylation: how are they grouped?
- The method outperforms baselines that do not use DNA methylation, which is somewhat expected given its strong prognostic value. Furthermore, although the literature is indeed less extensive in this area, prior works [2,3,4] have already explored multimodal survival prediction incorporating DNA methylation.
[1] Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images. ICCV 2021 [2] Long-term cancer survival prediction using multimodal deep learning. Scientific Reports 2021. [3] Deep learning with multimodal representation for pan- cancer prognosis prediction. Bioinformatic 2019. [4] DRIM: Learning Disentangled Representations from Incomplete Multimodal Healthcare Data. MICCAI2024
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- Building on previous points, a more detailed explanation of the specific challenges addressed and how the proposed solutions effectively tackle them would greatly help in grasping the paper’s contributions. Some minor points:
- In the AMT module, it appears that each CoC output (e.g., a 4×d tensor) forms part of a 13-token sequence (4 × 3 + 1). If correct, this point would benefit from a clearer explanation in the method section and figure.
- There seems to be a typo on page 2, line 2: “transomics.”
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper presents an original multimodal framework for survival prediction using RNA expression, DNA methylation, and whole-slide histopathology images. The introduction of the Chain-of-Cancer (CoC) architecture, with its intra- and inter-learning phases and the use of autoregressive modeling with mutual information minimization, is novel and demonstrates promising results. Overall, the paper introduces creative and potentially impactful ideas, but would benefit from more clarity and in-depth analysis to fully validate its diverse contributions.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This paper introduces a new methodology for multimodal survival prediction with language guidance, called Chain of Cancer (CoC) (inspired by the Chain of Thought framework). Specifically, the authors introduce the COC-Adapter that uses the modality features to merge with the textual features and the Autoregressive Mutual Traction module, that uses the next token prediction model to predict the feature of the next modality. The authors also claim to be the first ones to use methylation and language modalities for cancer survival prediction. The proposed framework along with new modalities improves the state-of-the-art on 5 cancer datasets.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This is a nicely written paper and the authors do a fine job of presenting their contributions and tabulating the results. Multimodal approaches with pathology and genomic data are quite common but in this work, the authors introduce the methylation and language data for cancer survival prediction, which is a novelty, as they claim. Secondly, the proposed architecture with the language guidance for each modality and the modality-wise feature translation allows for the method to achieve SOTA on 5 cancer datasets, which is also one of the strengths of this work. Ablation studies and Kaplan Meier analysis also supplement the main results of the paper.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

In the main results table of the paper (Table 1), the authors compare the performance of different methods with different data inputs (which is neat) but when it comes to competing multimodal frameworks such as MCAT, CMTA, MOTCAT, etc. the authors only use the bimodal data as inputs where as the proposed method has the advantage of 4 modalities of inputs (including language), which handicaps the competing multimodal methods. This challenges the contribution of the proposed framework and raises the question – are the SOTA results coming from the additional modality or the proposed methodology?

Secondly, in the ablation study (Table 2), it is unclear how the different modules of the methodology contribute to the final model. For example, it is not clear if the M1, M2, etc refer to the modalities and if so which ones. It is also unclear why the authors only use COC1 and COC3 in their ablations (why not COC2 and COC4). Although the authors make the best use of the space to describe as accurately as possible, the description doesn’t do a great job of the authors’ reasoning behind the ablation choices.

It would also be beneficial to see the impact of the different loss functions used in the framework. For example, it is unclear why the authors used the mutual information regulation loss along with the other losses.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

The authors also make use of the ResNet50 backbone for the feature extraction which could limit the model’s performance as it is not domain specific (although the authors use CONCH for the language encoding). I encourage the authors to use the pathology specific foundation models such as UNI, GigaPath, etc.

Further, I believe the language prompts need to be studied more beyond simple prompts. I encourage the authors to collaborate with clinicians on generating more specific prompts and if they can improve the survival prediction performance.

The overview image is informative but for the future works, I encourage the authors to use consistent fonts and align boxes as much as possible. The gradient in the background is not essential and the Figure 1 could have been simplified in my opinion.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is a good paper – adds new modalities for cancer survival prediction and archives promising results. The methodological contributions are questionable and needs to be further explored but the paper all-in-all is a good read.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

Common Response

We do appreciate all of the positive and insightful comments. We will carefully check the paper, cite the missing literatures, and correct the typos and illustrations.

To R#1

Question on CoC-Adapter?

We illustrate the details of CoC-Adapter in Figure 2. All the adapters share the same structure but have different parameters and prompts. We use the text embedding as guidance, concatenate it with features from other modality, and generate an enhanced feature through an MLP projection.

Question on the usage of text?

We are the first to explore the application of MLLMs in survival prediction tasks. As shown in Table 2, the performance gain achieved with CoC-Adapter verifies the effectiveness of incorporating text guidance. We hope our work can inspire future studies by interfacing with larger MLLMs.

Typos of “casual”

The “casual” means decoder-only structure with masked-attention.

To R#3

Question on our motivation?

We are motivate by the prevailing MLLM and CoT, being the first to integrate language for survival prediction task.

Question on AMT?

We deploy decoder-only structure for casual inference. And [S] means the start token and the last red one means end token. We will clear it in our final version. The idea of AMT is to build up the dependency between different modalities via a reconstruction constraint, and the Mutual Information regulation is used to prevent over-reconstruction.

Question on Global Pathology?

In previous works, they only deploy the local WSI embeddings which ignores the global contexts (e.g., tumor microenvironment). To this end, we utilize additional downsampled WSI for compensating the global information.

Question on implementation?

Due to page limitations, we are unable to include more extensive ablation studies on these hyperparameters. Therefore, we present these configurations in an empirical manner.

To R#4

Question on ablation study?

The detailed configurations of ablation study are presented in Page 7. And we have analyze the contribution of the introduced mythylation in our ablation studies. As we have presented, “without using methylation data(M1 and M2), our methods still overhead the counterparts”.

Question on loss function?

As aforementioned, these configurations in an empirical manner. Please check the response of TO R#3--Question on AMT? for the usage of additional losses.

Question for the future work?

We do thank for you suggestion. We notice that more and more foundation model have emerged in WSI analysis. And our future work will continue to explore them with cutting-edge MLLM.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

CoC: Chain-of-Cancer based on Cross-Modal Autoregressive Traction for Survival Prediction

Author(s):