Abstract

Multimodal learning significantly benefits survival analysis for cancer, particularly through the integration of pathological images and genomic data. However, this presents new challenges on how to effectively integrate multi-modal biomedical data. Existing multi-modal survival prediction methods focus on mining the consistency or modality-specific information, failing to capture cross-modal interactions. To address this limitation, attention-based methods are proposed to enhance both the consistency and interactions. However, these methods inevitably introduce redundancy due to the overlapped information of multimodal data. In this paper, we propose a Multi-Granularity Interactions of heterogeneous biomedical data framework (MuGI) for precise survival prediction. MuGI consists of: a) unimodal extractor for exploring preliminary modality-specific information, b) multi-modal optimal features capture (MOFC) for extracting ideal multi-modal rep-resentations, eliminating redundancy through decomposed multi-granularity information, as well as capturing consistency in a common space and enhancing modality-specific features in a private space, and c) multimodal hierarchical interaction for sufficient acquisition of cross-modal correlations and interactions through the cooperation of two Bilateral Cross Attention (BCA) modules. We conduct extensive experiments on three cancer cohorts from the Cancer Genome Atlas (TCGA) database. The experimental results demonstrate that our MuGI achieves the state-of-the-art performance, outperforming both unimodal and multi-modal survival prediction methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1471_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Lon_MuGI_MICCAI2024,
        author = { Long, Lifan and Cui, Jiaqi and Zeng, Pinxian and Li, Yilun and Liu, Yuanjun and Wang, Yan},
        title = { { MuGI: Multi-Granularity Interactions of Heterogeneous Biomedical Data for Survival Prediction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15005},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The manuscript proposed a multi-granularity interactions of heterogeneous biomedical data framework (MuGI) workflow to enhance survival analysis using WSIs and genomic data. The Multimodal Optimal Features Capture (MOFC) can distinguish modality-common and modality-specific information. Then, the Bilateral Cross Attention (BCA) modules can fuse multimodal information for survival prediction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The design of Multimodal Optimal Features Capture (MOFC) is an good attempt to distinguish modality-command and modality-specific information. It is interesting but needs further clarifications and experiments to support the conclusion. Using the proposed MuGI, among three datasets, the survival prediction can be consistently improved compared with several baselines.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Several details of the methodology need to be clarified. Also, the contribution of the second BCA needs to be better explained. Further, the subfigures in Fig1 are unclear. For the detailed comments, please view the content in 10.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. In Fig.1.A, subfigures (e.g., patient, WSI, Genes, Gene sets) are unclear. I would recommend the authors revised these subfigures by using vector figues.
    2. Why use a second BCA to integrate S (i.e., the output of the first BCA) and the concatenation of C^{P, G}? If the operation is to enhance the modality-specific information, why not directly concat S^P and S^G and then fuse the concatenation with S? S^P and S^G contain more modality-specific information than C^P and C^G. I am curious about the comparison between using concat(C^P, C^G) and concat(S^P, S^G), which can determine whether MOFC can enhance the modality-specific information.
    3. How can high/low risk be determined in Fig2? The author mentioned that the OS time is divided into 4 intervals in 3.2. Is the high/low risk also determined by dividing OS times into two intervals?
    4. What are the values of alpha_1, alpha_2, and beta in eq (6)? Are these weights learnable or fixed based on experiments? If they are learnable, more details need to be explained in the manuscript. This parameter should affect the model performance because the scale among losses can be different.
    5. Related work can consider to contain two-stage multimodal fusion workflows [1,2] to summarize the current progress of multimodal data fusion for survival outcome prediction. [1] Pathology-and-genomics Multimodal Transformer for Survival Outcome Prediction. MICCAI 2023 [2] Discrepancy and Gradient-Guided Multi-modal Knowledge Distillationfor Pathological Glioma. MICCAI 2022
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The manuscript is well-organized. Several experiments have demonstrated the proposed method’s capability in assisting patient survival prediction among three datasets. Yet, more clarifications and experiments are needed to support the conclusion related to the proposed modules.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Thanks for the detailed clarification. The authors have resolved all of my concerns. I would like to increase my previous score.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a Multi-Granularity Interactions of heterogeneous biomedical data framework (MuGI) for precise survival prediction. MuGI consists of: a) unimodal extractor for exploring preliminary modality-specific information, b) multimodal optimal features capture (MOFC) for extracting ideal multi-modal representations, eliminating redundancy through decomposed multi-granularity information, as well as capturing consistency in a common space and enhancing modality-specific features in a private space, and c) multi-modal hierarchical interaction for sufficient acquisition of cross-modal correlations and interactions through the cooperation of two Bilateral Cross Attention (BCA) modules. This paper conducts extensive experiments on three cancer cohorts from the Cancer Genome Atlas (TCGA) database.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The design rationale of the model is quite reasonable: it first extracts modality-specific information, then utilizes adversarial learning to extract multi-modal representations and reduce redundancy, and finally employs a two-stage attention module for further interaction between different modal information. (2) Overall, the model design is rather complex, and implementing this method in engineering and achieving good results in this paper is not easy.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper’s writing could be improved: it extensively repeats the overall design ideas multiple times, but lacks sufficient detail on model and experimental specifics, leading to poor repeatability of the paper’s results. The composition of the loss function is quite extensive, and it would be beneficial to provide more detailed explanations for each component, with each loss function labeled in Figure 1. Additionally, regarding the ablation study and the significance of removing each component, it is hoped that some explanation can be provided.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper’s idea is excellent, and it is hoped that the writing can be meticulously refined. Detailed suggestions are provided in the Weaknesses section.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper’s idea is good, and the writing can be meticulously refined.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    Their rebuttal feedback is very good. They explained more details about loss functions and each ablation variant. And they will refine the writing and improve the introduction of methodology in the future version.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a multi-granularity biomedical data framework (MuGI) for survival prediction. It proposes a self-consistent framework leveraging genomic information and pathology image data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The motivation truly challenges multimodal processing. The paper conducts a comprehensive literature review and categorizes existing methods into intra-modal representation and inter-modal fusion.

    2. The framework is self-consistent. It includes both unimodal extractors and Multimodal Hierarchical Interaction. The design features “a private space to enhance modality-specific information” and “a shared latent space to mine modality-common information,” truly addressing the challenge of integrating complementary information from individual modalities while removing redundancy.

    3. I really like the design of the BCA module. This module can capture multimodal cross-attention. Such a design is transferable to other similar tasks. The BCA is also evaluated in the ablation study.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The font of the 2.4 section subtitle should be “Multimodal Hierarchical Interaction.”
    2. I am not sure if the paper uses the correct format and font.
    3. The items in Eq.6 are not explicitly introduced, making it hard to follow.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please check format

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    overall framework design

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank all reviewers for their acknowledgment of our contribution and their constructive comments for further clarification. Q1: Detail about loss functions Eq.6. (R1&R3) A1: The final objective function contains five parts, as follows. (1) Lsur is calculated based on the risk values output by the final survival header, that adopts the NLL function. (2) NLL loss Lcsur is calculated based on the risk values output by an additional survival header (a linear layer), the header accepts the average of C^P and C^G. For (3) the common loss Lc and (4) the specific loss Ls, we employ ground truth modality labels represented by one-hot encoding and use BCELoss to compute them in adversarial learning for modality binary classification (WSI or gene), ensuring the purity of common and specific features. (5) The Eq.5 provides Lor, that further eliminant redundancy through penalize C and S using Euclidean distance. In the final paper, we will add each loss function labeled in Fig. 1 for better understanding and provide details of loss functions. Q2: Parameters of Eq.6. (R2) A2: The parameters are established empirically. We set alpha_1=0.1 and alpha_2=0.01 in all datasets, beta=0.7 in BLCA and beta=0.5 in the other two datasets. We will report these settings in the final paper. As suggested, we will explore learnable parameters in the future. Q3: The second BCA’s contribution and effectiveness of MOFC. (R2) A3: The second BCA aims to capture interactions of common and specific information that remain consistent and specific respectively, and explores more complementary information. Also, as suggested, we have used the second BCA to integrate S and concatenate (S^P, S^G) retraining our model on BLCA, obtaining an unsatisfied CI result. This performance degradation is due to incomplete information in fusion features. In MOFC, the two FC layers closely cooperate with the discriminator and play pivotal roles in enhancing specific information. This is supported by the ablation study results obtained by our model w/o Specific. Also, the performance degradation brought by the model w/o GAN indicates the vital role of common information. Q4: How can high/low risk be determined in Fig. 2? (R2) A4: Our model outputs risk values (probability of an event occurring at a specific time). Following the routine clinical practice in predicting survival risks, patients were divided into high/low-risk groups based on median risk, and KM curves were built for each group reflecting OS times. Q5: Explanation of each ablation variant. (R3) A5: w/o Lor: to verify the redundancy elimination ability of Lor; w/o Specific: removing the two FC layers of MOFC to verify its ability to enhance the specificity of the layers; w/o GAN: to confirm whether both consistent and reduced redundant information improve the performance; w/o BCA: to verify multi-modalities, multi-granularity integration and interaction ability of BCAs, including i) the integration and interaction between specific genetic features and specific pathological features; ii) the integration and interaction between common and specific features. We will provide these explanations in the final paper. Q6: The model is complex. (R3) A6: As a multimodal approach combining large-scale pathological images, our method, with moderate parameters of 7.81M and GFLOPs of 22.12, is not excessively complex. Also, the inference speed of our method is 179848p/s which is far better than that of CMAT of 84407p/s (p/s is the number of processing patches of WSI per second). Q7: Format and writing. (R1&R3) A7: We will double-check the paper to ensure correctness of the format. We will refine the writing and improve the introduction of methodology to enhance the clarity of our work. Q8: Subfigures in Fig. 1 are unclear. (R2) A8: We will improve the image quality in the final paper. Q9: Missing citations of two-stage multimodal fusion methods. (R2) A9: We will cite the mentioned references in the final paper.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper proposes a multi-granularity biomedical data framework (MuGI) to enhance survival analysis using WSIs and genomic data. The method is interesting and novel. The paper is clearly written, and the experimental setup and ablation study are sound. Previous concerns of reviewers have been addressed during the rebuttal. After the rebuttal, the reviewers reached consensus about its acceptance.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper proposes a multi-granularity biomedical data framework (MuGI) to enhance survival analysis using WSIs and genomic data. The method is interesting and novel. The paper is clearly written, and the experimental setup and ablation study are sound. Previous concerns of reviewers have been addressed during the rebuttal. After the rebuttal, the reviewers reached consensus about its acceptance.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top