Abstract

Survival prediction, utilizing pathological images and genomic profiles, is increasingly important in cancer analysis and prognosis. Despite significant progress, precise survival analysis still faces two main challenges: (1) The massive pixels contained in whole slide images (WSIs) complicate the process of pathological images, making it difficult to generate an effective representation of the tumor microenvironment (TME). (2) Existing multimodal methods often rely on alignment strategies to integrate complementary information, which may lead to information loss due to the inherent heterogeneity between pathology and genes. In this paper, we propose a Multimodal Cross-Task Interaction (MCTI) framework to explore the intrinsic correlations between subtype classification and survival analysis tasks. Specifically, to capture TME-related features in WSIs, we leverage the subtype classification task to mine tumor regions. Simultaneously, multi-head attention mechanisms are applied in genomic feature extraction, adaptively performing genes grouping to obtain task-related genomic embedding. With the joint representation of pathological images and genomic data, we further introduce a Transport-Guided Attention (TGA) module that uses optimal transport theory to model the correlation between subtype classification and survival analysis tasks, effectively transferring potential information. Extensive experiments demonstrate the superiority of our approaches, with MCTI outperforming state-of-the-art frameworks on three public benchmarks.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1280_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1280_supp.pdf

Link to the Code Repository

https://github.com/jsh0792/MCTI

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Jia_Multimodal_MICCAI2024,
        author = { Jiang, Songhan and Gan, Zhengyu and Cai, Linghan and Wang, Yifeng and Zhang, Yongbing},
        title = { { Multimodal Cross-Task Interaction for Survival Analysis in Whole Slide Pathological Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper, the authors proposed a multimodal deep learning model for survival analysis by integrating multi-omics data (RNA-Seq, CNV, and SNV) with Whole Slide Imaging (WSI) images. Image features are extracted through subtype classification (cross-task interaction), while gene features are extracted using adaptive grouping. Subsequently, these extracted image and gene features are combined into a bag of image and gene to develop a multi-instance learning technique. They introduced a novel Transport-Guided Attention (TGA) mechanism, which efficiently combines gene and image representations. The authors reported that their model outperforms benchmark methods by 7%. Overall, the paper presents interesting insights into the field of integrating multiple data modalities, and the techniques proposed hold promise for broader applications.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • To integrate the modalities, they proposed an innovative architecture module named the “TGA-Based Encoder-Decoder module” that combines image tokens (a bag where each token represents patch features) and gene tokens using a self-attention structure. Especially, TGA innovatively combines classification tokens and survival tokens by incorporating an “inter-task guidance matrix” and an “intra-task guidance matrix” to facilitate inter-data interactions without compromising intra-data interactions.
    • An objective function for subtype classification is incorporated into the risk score prediction to enhance cross-task interaction. Additionally, a combined loss function is proposed to integrate subtype classification, extraction of bag of image features, and survival loss tasks, which prevents the integration model from overfitting and biasing towards one task.
    • The proposed Encoder-Decoder network can reconstruct features by considering shared multi-modal information, which enables the regeneration of tumor based on genomic profiles.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The definitions of subtypes mentioned in the paper are not clearly defined. Do the authors refer to medical subtypes? If so, is there any labeling indicating that one slide belongs to multiple subtypes?
    • Including biological interpretation, particularly regarding how the model findings of gene features affect image morphology may be impactful.
    • Evaluating using multiple datasets is highly encouraged to generalize the model performance.
    • The majority of compared methods are either with image-only data or with clinical and image data. Out of 8 compared methods, only 2 were actual multi-omics and pathology integrated. It is highly encouraged to compare with the latest multi-model integration works.
    • Interactive graphical explanations, such as probability maps highlighting cancer regions, can have a more impactful role in understanding.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    NA

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The idea of cross-task correlation is innovative for extracting image features, as relying only on survival models to extract these features which may not be ideal.
    • It would be impactful if authors discuss model findings correlation with biological literature.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The proposed strategy, TGA, is innovative, considering cross-task interactions.
    • More details of the experiments are required.
    • Model interpretation would be strongly encouraged to explore new biological/medical insights.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper focuses on survival analysis using histopathological images and genomic data, proposing a multimodal cross-task framework for information fusion. For histopathology, the model is trained using subtype classification to capture information about the tumor microenvironment (TME).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    using histopathology and genes is good for survival prediction. using subtype classification task as a pretext task to train model to learn reprensentation about TMA is good.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There should be further elaboration and references to underscore the importance of focusing on the TME, such as studies showing its impact on cancer progression and treatment responses.

    The method of integrating features from different omics is crucial. The paper should discuss the rationale for choosing concatenation over other alignment methods, considering the impact on model performance and the ability to maintain meaningful biological correlations.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    may release code

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Figure 1 may be improved to highlight the stream from the input to output, showing the connections and differences compared to preivous multimodal methods.

    Given the model’s reliance on three proposed modules: TGA, TGA-based encoder, and decoder, the authors should consider releasing the code to aid readers in understanding the model’s architecture.

    Unlike other methods, this study concatenates histopathological and genomic data at the frontend, feeding it directly into the subsequent reconstruction phase. More justification is needed for why concatenation is more effective than previously used methods like co-attention mechanisms, potentially including comparisons of performance metrics or model interpretability.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    using histopathology and genes is good for survival prediction. But need more experiment about feature alignment.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose a multimodal method to combine genomics with pathology for the highly clinically-relevant task of survival prediction. They leverage a subtype classification task for aggregation of the patch-level features aiming to capture information about the tumor microenvironment. This is then concatenated with a genomics feature vector obtained through adaptive grouping, after which they employ their proposed Transport-Guide Attention module to predict the survival hazard score. They internally train, test and validate their method in 4 cohorts of TCGA, and benchmark against several methods to compare their method. Overall, the paper looks pretty. It has nice figures, the storyline makes sense from a technical point of view, the technical adaptations (and ablations) are properly reasoned for and their novel method is benchmarked against the status-quo.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Beautiful paper with informative figures, detailed methodology and in a technical sense, a super strong and complete workflow. The authors perform benchmarks with several relevant methods and perform ablations of their novel method for survival prediction. Overall, the work seems reproducible (besides some slight comments mentioned below), and is performed on all open-source data, which is a big plus.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    One of the main selling points of the paper is using information from the tumor microenvironment to improve the patch selection, and therefore substantially improve the C-Index. However, the authors should be cautious with specifically the task of subtype classification for this purpose. It is common knowledge that the subtype of a cancer directly influences a patient’s survival. For example, in breast cancer, the triple negative subtype is very aggressive and has the worst prognosis. Same for NSCLC, LUSC has a worse prognosis than LUAD. Based on the current experimental setup, I have substantial concerns that the authors have created a model which simply predicts the subtype which has a significant correlation with survival, and is solely outperforming other methods due to injection of highly correlating data to the prediction task (which the other methods do not have in the benchmarking). This concern is also fueled by observing the ablation study done without the subtype task added, dropping the performance a lot. Therefore, the analysis lacks correction for/stratification by clinicopathological covariates which potentially nullifies the entire technical novelty of this paper. Moreover, not having external validation cohorts in computational pathology leaves a lot of room for wondering whether a generalizable signal has been learned by the model, or whether it overfit on confounding factors / batch effects. Even when benchmarking with other methods, the question remains whether your method is simply overfitting harder on confounding factors than the other methods and thus reaching better quantitative metrics.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Visually a beautiful paper, seems reproducible overall, has clear and informative figures.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    A few comments to elaborate on, either in methodology or within the limitations of the paper: Please provide the final value of ⍺ for loss balancing, i.e. which loss is dominating for backpropagation? ImageNet pre-trained ResNet-50 has been consistently outperformed by domain-specific feature extractors for computational pathology, such as CTransPath or the latest UNI model, among several other options which have been out for 2 years. Is there a reason for this choice?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    If it weren’t for the missed medical nuance and evaluation, I would’ve scored this paper 5/6. The technical part of this paper is nicely done, interesting, in a clinically-relevant application, but the uncertainty of the subtype classification task introducing data leakage and therefore causing the superior technical performance would need to be clarified for acceptance.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Thanks for your valuable comments. We will proceed with detailed revisions and clarifications.

To Reviewer 1:

Firstly, regarding the TCGA dataset, we employed medical terminologies to describe different subtypes of cases. Regarding the single WSI labels, it is possible that a WSI may correspond to multiple labels, but the likelihood of occurrence could be relatively low, or its proportion within the entire WSI is comparatively low.

Regarding the interaction between gene and pathology image modalities, your suggestion regarding visualizing the impact of genes on WSI is indeed valuable. We propose to illustrate this by visualizing attention maps for both single-modal and multi-modal instances.

About comparison methods, among our eight methods, MCAT, CMTA, PORPOISE, and M3IF are multimodal methods integrating genes and pathology. We will further compare with newer models such as MoCAT, SurvPath, and PIBD.

In Figure 4, we show partial visualizations. For the visualization of cancer region probabilities across the WSI, we have provided Supplementary Files Figure 1 with ground truth annotations from pathology experts.

To Reviewer 3:

In regards to the tumor microenvironment, we have cited several medical articles in the Introduction section. For instance, Reference [8] highlights the importance of lymphocytic infiltration in the tumor microenvironment. Furthermore, Reference [18] discusses the impact of gene expression on the tumor microenvironment and cancer diagnosis.

We appreciate your suggestion and will incorporate additional literature related to the tumor microenvironment in survival analysis.

[1] Wang, Weichen, et al. “The cuproptosis-related signature associated with the tumor environment and prognosis of patients with glioma.” Frontiers in immunology 13 (2022): 998236. [2] Brodsky, A.S., Khurana, J., Guo, K.S. et al. Somatic mutations in collagens are associated with a distinct tumor environment and overall survival in gastric cancer. BMC Cancer 22, 139 (2022). [3] Tron, Laure, et al. “Socioeconomic environment and disparities in cancer survival for 19 solid tumor sites: An analysis of the French Network of Cancer Registries (FRANCIM) data.” International journal of cancer 144.6 (2019): 1262-1274.

We utilize optimal transport between two tasks and intentionally avoid simultaneous alignment between two modalities, which may blur the impact of subtype classification. Additionally, we believe that if alignment is solely performed between the two modalities, it may result in the loss of information from individual modalities.

To Reviewer 4:

In our study, the subtype classification task provides insights into the tumor microenvironment for survival analysis. We agree that different subtypes could lead to varying levels of risk in survival periods. We do not entirely agree that our method leaks information from the subtype classification to survival analysis. We believe that, compared to other methods, our approach supervises the accurate delineation of cancer regions more effectively to better determine the tumor microenvironment. In other words, the other models could capture the information about different subtypes if they are ideal. In our method, this information serves as prior knowledge guiding the model to focus on the tumor microenvironment. And the ⍺ value is 1.

Regarding your mention of the ablation experiment in Table 2, where patches were randomly selected, it is plausible that these patches may contain a lot of adipose tissue or normal cells, thus ignoring significant tumor-related information and resulting in a severe performance drop. We believe this issue is not directly related to the concerns you raised about subtype classification leaking information relevant to survival analysis.

In response to the external cohort, we are collaborating with hospitals to collect a batch of data to serve as the external cohort. This will help further validate the effectiveness of our experiments in the paper.




Meta-Review

Meta-review not available, early accepted paper.



back to top