Abstract

Survival prediction requires integrating Whole Slide Images (WSIs) and genomics, a task complicated by significant heterogeneity and complex inter- and intra-modal interactions between modalities. Previous methods used co-attention, fusing features only once after separate encoding, which is insufficient to model such a complex task due to modality heterogeneity. To this end, we propose a Biased Progressive Encoding (BPE) paradigm, performing encoding and fusion simultaneously. This paradigm uses one modality as a reference when encoding the other, fostering deep fusion of the modalities through multiple iterations, progressively reducing the cross-modal disparities and facilitating complementary interactions. Besides, survival prediction involves biomarkers from WSIs, genomics, and their integrative analysis. Key biomarkers may exist in different modalities under individual variations, necessitating the model flexibility. Hence, we further propose a Mixture of Multimodal Experts layer to dynamically select tailored experts in each stage of the BPE paradigm. Experts incorporate reference information from another modality to varying degrees, enabling a balanced or biased focus on different modalities during the encoding process. The experimental results demonstrate the superior performance of our method on various datasets, including TCGA-BLCA, TCGA-UCEC and TCGA-LUAD. Codes are available at https://github.com/BearCleverProud/MoME.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2168_paper.pdf

SharedIt Link: https://rdcu.be/dY6iJ

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72083-3_30

Supplementary Material: N/A

Link to the Code Repository

https://github.com/BearCleverProud/MoME

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Xio_MoME_MICCAI2024,
        author = { Xiong, Conghao and Chen, Hao and Zheng, Hao and Wei, Dong and Zheng, Yefeng and Sung, Joseph J. Y. and King, Irwin},
        title = { { MoME: Mixture of Multimodal Experts for Cancer Survival Prediction } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {318 -- 328}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper integrates Whole Slide Images (WSIs) and Genomic Data (multi-modal fusion) for survival analysis. The authors proposed a Biased Progressive Encoding (BPE) paradigm – an integrative approach of encoding the features of one modality using the information from other modality as a reference. The authors used a Gating Network mechanism to identify the best encoder from a mixture of multimodal experts for encoding the data. The authors evaluated the proposed BPE paradigm on TCGA BLCA, UCEC and LUAD datasets, where the proposed method showed better performance than the other methods on the UCEC and LUAD datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • One of the promising aspects of this paper is the proposal of fusing and encoding multiple modalities from lower levels compared to the existing approaches that encode the modalities separately and fuses them.
    • The adaptation of mixture of multiple experts through a gating network was aptly justified that the inter-individual variations can cause key features for survival analysis to appear in different modalities for each patient.
    • They provided a justification why co-attention is sub-optimal substitute for self-attention for experts.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The authors claimed that the performance of the proposed architecture is significantly better than the existing approaches. However, the improvement is around 1% only on the small test samples (<100), where the variation is around 3%. It would be strongly encouraged to report statistical significance test using Wilcoxon ranked sum test.
    • In-depth biological discussion on the proposed model’s interpretation would be insightful.
    • It is not clear how many times WSI patch features and gene features are encoded in the BPE layer to reproduce. Figure 1 demonstrates that each modality was encoded twice, is it the same for all the datasets? Or the number of encodings should be considered as a hyper-parameter for each dataset?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    b. The dimensions of matrix P in figure 1 were mentioned as nh x d whereas under section 2.1, first paragraph, it was mentioned as np x d.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Given that the application of the proposed model architecture is for decision-making in medical purposes, it is highly encouraged that the authors biologically discuss the model interpretation capabilities.
    • The proposed approach was tested on survival predictions task and an experimental evaluation on cancer classification, cancer sub-type classification will be helpful in determining the adaptability of the proposed framework for multiple downstream tasks.
    • The dimensions of matrix P in figure 1 were mentioned as nh x d whereas under section 2.1, first paragraph, it was mentioned as np x d.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The idea of fusing and multiple modalities of data is interesting, and the adaptation of mixture of multiple experts can shed light on the factors that drive the model to give a certain prediction for each patient.
    • The experimental result may not be significantly improved comparing to the other models.
    • Biological interpretation is strongly encouraged.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    In this paper, the authors have proposed a MoME framework for the integrative analysis of histopathological images and genomic data on the survival prediction task. The experimental results conducted on three cancer cohorts validate the superority of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The motivation of this study is well described, which address a very interesting topic in precision oncology.
    2. The paper is well organized and easy to follow.
    3. The experimental results conducted on three cancer cohorts in the TCGA dataset validates its superority in comparison with the existing studies.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The MMOE block is badly described, What is the general formulation of the traditional MOE module? How to organize the TransFusion, Bottleneck TransFusion and the SNNFusion blocks in MMOE?
    2. Only applying the measurment of Concordance Index (CI) can not adequatly validate the advantage of your method. I suggest that the author could further apply the log-rank test to show if your method can effectively stratify patients into high-survival and low-survival risk groups.
    3. Some symbols miss definitions i.e., n_1 and n_2 in Eq.(3), which makes the paper hard to understand.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Rewritten the MMOE block and proofread the paper to avoid any symbols without definition.
    2. Implement the log-rank test to further verify the superority of the proposed method.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The topic is interesting and the experimental results are comparable to the SOTA methods. 2; This paper provides a new way for the integrative of multi-modal data on the survival prediction task.
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The manuscript developed a novel mixture of multimodal experts (MoME) for enhancing patient survival prediction on multiple cancers. The proposed method includes a Biased Progressive Encoding (BPE) module to perform feature encoding and fusion simultaneously. Then, MoME is proposed to select the proper fuse strategy adaptively.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The design of multiple fusion expert is interesting and reasonable for better exploring the proper fusion means, enabling enhancing patient survival outcome prediction. The manuscrip is easy to follow and the figure is helpful for understanding the proposed method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses are summarized below; please view the detailed comments in “10. detailed and constructive comments”:

    1. This is not a pan-caner research because only three types of cancer are discussed in the manuscript; hence, multi-cancer will be more accurate.
    2. A few methodology implementation details can be clarified.
    3. One more ablation study can be considered.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Instead of pan-cancer, I strongly recommend that the authors highlight their contribution as multi-cancer to avoid misunderstanding. There are only three types of cancer included in the manuscript. Pancancer will need to discuss more than ten to twenty cancers.
    2. What is the number of image bags (n_p), and how to determine it?
    3. How can high/low risk be determined? Does the author define the risk degree in ground truth by using OS time and dividing it into two intervals?
    4. Could I understand that the gate network for expert selection is a type of classifier that produce logits for each type of expert, and the softmax function is used to select the expert with the largest logits? Also, as mentioned, the traditional MoE emphasizes a weighted sum of experts. Could the authors add this experiment as part of an ablation study related to the choice of experts?
    5. Beyong the end-to-end workflows mentioned in the related work, two-stage multimodal fusion workflows for patient survival prediction [1,2] should also be included in related work (introduction section). [1] Pathology-and-genomics multimodal transformer for survival outcome prediction. MICCAI 2023 [2] Discrepancy and Gradient-GuidedMulti-modal Knowledge Distillationfor Pathological Glioma. MICCAI 2022
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Each module in the proposed framework is reasonable and has been demonstrated helpful for enhancing multimodal data fusion and survival outcomes via extensive experiments and comparison with previous studies. The paper is well-written and easy to follow.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We express our sincere gratitude to the reviewers for their valuable time and effort in evaluating our paper. We greatly appreciate their insightful and overall positive feedback. We are pleased to note that the reviewers have recognized the following aspects of our manuscript: (1) the manuscript is well written and easy to follow (R1, R4), (2) the proposal of MoME is promising and reasonable (R1, R3, R4), (3) the justification of why self-attention is better than co-attention (R3), and (4) our method is effective (R1, R4). Below, we provide our responses to the questions.

  1. MoME block (R1) The traditional MoE [1] is a set of feedforward neural networks aimed at increasing model capability without a proportional increase in computation. In contrast, the experts mentioned in our MoME are sophisticatedly designed architectures that reflect the degrees of fusion of the two modalities. For more detailed information about MoE, please refer to [1], and we will add more information in the final paper.
  2. Additional experiments (R1, R3, R4) We acknowledge that some experiments were missing in our paper, such as the Wilcoxon ranked sum test and log-rank test, due to space limitations in MICCAI. We will include these experiments in the future. Additionally, we have conducted an ablation study on the soft/hard assignment of the experts, but we cannot include the experimental results here according to the guidelines. Regarding cancer classification and subtyping tasks, multimodal data is not necessary as the WSIs alone are sufficient for diagnosis, making it a non-multimodal learning task.
  3. Definition problems (R1, R3, R4) We will further refine the manuscript to avoid any confusion. In our paper, n1 and n2 represent the number of features in the first and second modality, respectively, while n_p denotes the number of patches cropped from the WSIs. We perform segmentation algorithms to detect tissue and background regions, crop the tissue regions, and obtain n_p cropped patches. The definition of high/low risk is based on the ground truth, where we utilize OS time and divide it into four intervals, similar to MCAT [2]. More details can be found in our code when it is available.
  4. Biological interpretation (R3) We appreciate the reviewer for raising this question. We agree that interpretability is a crucial aspect in medical imaging, and we will address this in our future work.
  5. Encoding times (R3) We thank the reviewer for noticing this, and we would like to clarify here. The encoding times for each dataset are indeed twice, as depicted in the figure.
  6. Multi-cancer survival prediction (R4) We acknowledge the reviewer’s comment regarding the focus of our paper on multi-cancer rather than pan-cancer. We will modify the title to “Cancer Survival Prediction”.
  7. Additional reference (R4) In the camera-ready version, we will include additional relevant references to further support our work.

We believe that these responses and revisions can address the concerns of the reviewers and improve the quality of our manuscript. Thank you again for your valuable feedback.

[1] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, ICLR 2017. [2] Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images, ICCV 2021.




Meta-Review

Meta-review not available, early accepted paper.



back to top