Abstract

Precise prognostication can assist physicians in developing personalized treatment and follow-up plans, which help enhance the overall survival rates. Recently, enormous amount of research rely on unimodal data for survival prediction, not fully capitalizing on the complementary information available. With this deficiency, we propose a Multimodal Low-rank Interaction Fusion Framework Integrating Pathological images and Genomic data (PG-MLIF) for survival prediction. In this framework, we leverage the gating-based modality attention mechanism (MAM) for effective filtering at the feature level and propose the optimal weight concatenation (OWC) strategy to maximize the integration of information from pathological images, genomic data, and fused features at the model level. The model introduces a parallel decomposition strategy called low-rank multimodal fusion (LMF) for the first time, which simplifies the complexity and facilitates model contribution-based fusion, addressing the challenge of incomplete and inefficient multimodal fusion. Extensive experiments on the public dataset of GBMLGG and KIRC demonstrate that our PG-MLIF outperforms state-of-the-art survival prediction methods. Additionally, we significantly stratify patients based on the hazard ratios obtained from training the two types of datasets, and the visualization results were generally consistent with the true grade classification. The code is available at: https://github.com/panxipeng/PG-MLIF.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1221_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1221_supp.pdf

Link to the Code Repository

https://github.com/panxipeng/PG-MLIF

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Pan_PGMLIF_MICCAI2024,
        author = { Pan, Xipeng and An, Yajun and Lan, Rushi and Liu, Zhenbing and Liu, Zaiyi and Lu, Cheng and Yang, Huihua},
        title = { { PG-MLIF: Multimodal Low-rank Interaction Fusion Framework Integrating Pathological Images and Genomic Data for Cancer Prognosis Prediction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The manuscript proposed a Multimodal Low-rank Interaction Fusion framework for survival prediction. The framework contains an efficient low-rank multimodal fusion (LMF) for extracting the interaction among multimodal data. Furthermore, an optimal weight concatenation (OWC) strategy is used to enhance multimodal fusion.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The low-rank multimodal fusion (LMF) strategy proposed in the manuscript can introduce a good and efficient LMF for precise patient survival prediction.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The contribution of Optimal Weight Combination (OWC) is unclear. The current experimental results show that OWC can not significantly contribute to improve survival prediction. Further, the manuscript lacks the details of OWC; it is unknown how the model adjusts weights among modalities. For more detailed comments, please view the detailed comments in 10.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. No significant performance improvement by adding the design of OWC. Comparing the performance between MLIF (LMF only) and MLIF (LMF + OWC) in Table 2 and Table 3, only a small margin of improvement is achieved. If the authors repeat their experiments using different random seeds, this slight margin may even been omitted.
    2. I didn’t find the details of OWC. The overall idea is mentioned in 2.2; Yet, it is still unknown how the model adaptively adjusts the weights among different modalities.
    3. For now, the proposed method fuse multimodal vectors via LMF to get Z_h. I am curious about the results of simply using a concatenation operation to fuse multimodal representations after MAM.
    4. Two-stage multimodal fusion workflows [1,2] can be considered to mention in related work to reveal the current progress of multimodal survival outcome prediction. [1] Pathology-and-genomics Multimodal Transformer for Survival Outcome Prediction. MICCAI 2023 [2] Discrepancy and Gradient-Guided Multi-modal Knowledge Distillationfor Pathological Glioma. MICCAI 2022
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed LMF is helpful for enhancing multimodal data fusion. Yet, there are a few concerns regarding contribution and the details of OWC.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    Thanks for the clarifications. Yet, I still have concerns about the contribution of OWC, which is a crucial module in the proposed method. As shown in Table 3, there is no significant improvement between removing or adding OWC, i.e., MLIF(LMF only) vs. MLIF(LMF+OWC). Meanwhile, the statistical significance of the above comparison cannot be demonstrated by the reported P-value. I will keep my previous score.



Review #2

  • Please describe the contribution of the paper

    This paper presents a novel approach for integrating pathological imaging data with genomic data for improved survival prediction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Interesting method that uses a matrix decomposition strategy to reduce the computational burden.
    2. Good ablation studies showcasing the gain in performance from different model fragments as well as the gain in training and testing speed.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Important details are missing that are necessary for a proper understanding of the methodology, e.g., the OWC procedure.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?
    1. Please expand upon
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Please expand upon the advantages over your method compared to the transformer-based methods in the introduction.
    2. Why is a ResNet-50 used for feature extraction rather than a model particularly tuned for pathological data, such as CTransPath [1] or RetCCL [2]?
    3. Which embedding vectors are chosen? At which layer? Please specify this.
    4. Please expand upon how the gating-based attention mechanism works and how it determines the contribution of each modality.
    5. How are the features dimensions reduced from exponential to linear level? Please add more details.
    6. Please provide more details about low-rank factors and parallel decomposition.
    7. Make sure to add references where necessary, for example when mentioning tensor fusion networks. I assume you are talking about tensor fusion networks from the following publication [3]?
    8. Please provide more details about why the additional dimension is necessary before computing the Kronecker product. Why does this ensure that the unimodal features remain uneffected and facilitate a more comprehensive fusion?
    9. On page 3 you write that Zp ∈ R32×1, Zg ∈ R32×1, however on page 5 Zp,g ∈ Rm. Is the additional dimension removed?
    10. You write on page 5 that each modality has a different dimension d_i. Just for clarification, in this case d_i is equal (=32) for both Zp and Zg, correct?
    11. Please clarify the dimensions of Z_h.
    12. What does the tensor Z represent? How is it created from Z_h?
    13. Please specify what ID number you are referring to? Is this the PatientID, i.e., a patient-wise stratification?
    14. Given that no validation set was created, please specify how the best checkpoint was chosen. Also add more training hyperparameters for improved reproducibility (e.g., number of epochs).
    15. Please add more details on the optimal weight concatenation procedure.
    16. Why does the batch size differ for pathological samples and genomic samples? Doesn’t the MLIF model require matched samples? Please explain.
    17. For p-value computations, what is the score compared to?
    18. Comparing to the average baseline performance does not make a lot of sense. Rather you should compare to the best performing baseline.

    [1] https://www.sciencedirect.com/science/article/abs/pii/S1361841522002043 [2] https://www.sciencedirect.com/science/article/abs/pii/S1361841522002730 [3] https://arxiv.org/pdf/1707.07250.pdf

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Important details are missing in the methodology description (e.g. the OWC procedure)

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I thank the authors for assessing the points raised during the review. With regards to Q6, note that there are several efficient transformer architectures out there, which aim to mitigate the issues of transformer scaling such as TransMIL [1]. Also, methods such as TransMIL have shown to perform well with limited data (see TransMIL ,~1k samples for TCGA-NSCLC), even when using ResNet-50 feature extractors. Extensive pre-training is typically required only during the self-supervised learning step to embed image patches. Due to the lack of information required for reproducibility and the unclear method description I keep my score at weak accept.

    [1] https://proceedings.neurips.cc/paper/2021/file/10c272d06794d3e5785d5e7c5356e9ff-Paper.pdf



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors propose a novel multi-modal low-rank fusion to integrate pathological and genomics data. The proposed fusion is designed to be computationally efficient due to the low-rank nature and parallel nature of the decomposition, which enables the authors to overcome computational challenges in previous multimodality fusion approaches. The key trick is to utilise a low-rank weight matrix to weight the cross-correlations of the two modalities which can be computed in a parallel fashion.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper approaches the problem of multi-modality fusion using a simple low-rank approximation, which is generally intuitive: the assumption of the weight matrix which combines the multi-modality features enables the authors to naturally derive the proposed method.
    • The experimental results support the utility of the proposed method. On the two studied datasets GBM and KIRC the authors demonstrate improved performances in terms of concordance index values, going from 0.878 to 0.895 on the GBM. The ablation studies study the effect of the different components and confirm the need for all components - LMF and OWC - of the proposed method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Writing - several details of the proposed method are either unclear or confusing. Please see further comments below.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    In the methodology section,

    • In 2.1 - what are the inputs considered? Are they patches of WSIs? How are these obtained? The details are missing
    • In 2.1 - what is the “self-normalising” network used to process genomic data? Is this a standard networks? Any references which use this in previous works?
    • Can the authors clarify how the optimal weight concatenation is performed? E.g., assuming that the output of the LMF is a fused vector h, how do you assess the importance of “modality-specific” components, since the feature vector is a fused vector? This part is unclear and needs to be clarified.
    • In 2.3, it might be better if the content is re-written to just focus on the two-modality setting considered in the paper. Considering a general M-modality setting makes it confusing to follow. It would also be helpful to consider how the proposed method compares to previous works on tensor fusion, like tensor fusion network (TFN).
    • The details of the MAM component need to be described, at least in short, if it’s a major element/contribution of the paper.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite the issues with writing, (which can potentially be addressed in the final submission), the work presented in the paper is interesting and of relevance to the MICCAI community. It could foster interesting ideas about how to further incorporate notions of interpretability to the different rank components of the weight matrix to build an explainable method of multi-modality fusion.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We would like to thank all the reviewers for their constructive comments.

Q1:Reproducibility of the paper A1:We will release the code and models on GitHub if paper is accepted.

Q2:Add detailed experiments of OWC A2:We implemented OWC using adaptive dynamic weight adjustment. This method focuses on individual model predictions and adjusts weights based on the input data’s contribution. Here’s the process: 1) Feature Integration: Concatenate the feature vectors Z_i from all models into a combined vector Z=[Z_p,Z_g,Z_h]. 2) Optimal Weight Learning: Use a small neural network to learn optimal weights W=[w_p,w_g,w_h] are finally determined to maximize the performance of the combined model. 3) Concatenation with Weights: Once the optimal weights are determined, apply these weights to the corresponding feature vectors. The final combined feature vector is Z_h^’=[w_p Z_p,w_g Z_g,w_h Z_h].

Reviewer#3: Q3:Several details of the proposed method are unclear A3:In Section 2.1, our inputs consider extracting pathological images and genomic information separately. Pathological images were scaled to 512 × 512 pixels and trained on a 20x magnification region using a CNN. Genomic data was obtained from cBioPortal and a normalization layer in a self-normalizing network in Klambaeur et al. was applied to reduce overfitting.

Reviewer#4: Q4: Clarify whether OWC significantly contributes to improving survival prediction A4:1) Performance Improvement: Though the gains may seem modest when comparing LMF only and LMF + OWC in Tables 2 and 3, given the C-value of 0.891, significant enhancements are challenging to achieve. 2) Statistical Significance: Despite the minor improvements, The p-values in Tables 2 and 3 confirm the significance of these improvements. 3) Robustness: While concerns about variability from different random seeds are valid, repeating experiments with different seeds yielded consistent results. Additionally, employing fifteen-fold cross-validation enhances model stability.

Q5:The results of simply using a concatenation operation after MAM A5:Sorry for not showing the results of simply using a concatenation operation. But we have done the related experiments. In the GBMLGG and KIRC datasets, direct concatenation after MAM gave C-values of 0.876 and 0.709, while the OWC method achieved 0.881 and 0.718. These results show OWC’s superiority over simple concatenation.

Reviewer#5: Q6:Advantages over Transformer-based methods A6:1) Transformers require large pre-training data, but our method handles multimodal data effectively without extensive pre-training, suitable for small medical samples. 2) Transformers’ computational complexity is a challenge, but our fusion technique efficiently manages complex medical data feasibly. Further work will explore transformers.

Q7:Features dimensions reduced from exponential to linear level A7:In simpler terms, TFN is a tensor outer product and then fully connected process, whereas we no longer do the same as the TFN network, but instead perform a linear transformation for each mode separately. This is then followed by a multidimensional dot product, which essentially combines the results of multiple low-order vectors. This approach greatly reduces the number of parameters in the model while maintaining the validity of the model.

Q8:Some technical details are not described in detail. A8:Due to word limit, only part of key descriptions have been demonstrated.

Q9:Why is a ResNet-50 used for feature extraction? A9:In order to be consistent with the pathology feature extraction in the comparison experiments.

Q10:Which embedding vectors are chosen? At which layer? A10:We extract the embedding vector Z_p,Z_g∈R^(32×1) from the last hidden layer of the training network, using it as input for MLIF.

Q11:Please specify what ID number you are referring to? A11:The ID number here is a unique identifier for each patient in the public dataset.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Both the paper and rebuttal are read clearly. Cancer prognosis prediction with pathological images and genomic data is an important research topic. However, the proposed LMF and OWC lack novelty and clarity: LMF has been widely utilized in other multi-modal fusion works, such as the work “Discrepancy and Gradient Guided Distillation of Low-Rank Multi-Modal Knowledge for Disease Diagnosis”; the description for OWC is unclear and its improvement is incremental, as pointed out by Reviewer4 and Reviewer5. Given these limitations in novelty and clarity, I believe the paper falls short of meeting MICCAI’s standards.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Both the paper and rebuttal are read clearly. Cancer prognosis prediction with pathological images and genomic data is an important research topic. However, the proposed LMF and OWC lack novelty and clarity: LMF has been widely utilized in other multi-modal fusion works, such as the work “Discrepancy and Gradient Guided Distillation of Low-Rank Multi-Modal Knowledge for Disease Diagnosis”; the description for OWC is unclear and its improvement is incremental, as pointed out by Reviewer4 and Reviewer5. Given these limitations in novelty and clarity, I believe the paper falls short of meeting MICCAI’s standards.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper proposed a multi-modal low-rank fusion to integrate pathological and genomics data effectively and efficiently. Reviewers consistently found the low-rank multimodal fusion (LMF) is interesting and has technical contribution. On the other hand, this paper has some flaws. For example, Reviewer #4 and #5 found that the details helping to understand OWC are missing. Reviewer #3 mentioned the clarity of the paper could be further improved. Reviewer #4 also questioned the contribution of OWC. Despite these issues, the proposed low-rank fusion strategy brings new insights into multimodal learning research, whose technical contribution is solid. The authors should address the writing issues above and explain the evaluation metrics in the revision.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper proposed a multi-modal low-rank fusion to integrate pathological and genomics data effectively and efficiently. Reviewers consistently found the low-rank multimodal fusion (LMF) is interesting and has technical contribution. On the other hand, this paper has some flaws. For example, Reviewer #4 and #5 found that the details helping to understand OWC are missing. Reviewer #3 mentioned the clarity of the paper could be further improved. Reviewer #4 also questioned the contribution of OWC. Despite these issues, the proposed low-rank fusion strategy brings new insights into multimodal learning research, whose technical contribution is solid. The authors should address the writing issues above and explain the evaluation metrics in the revision.



back to top