Abstract

Cancer survival prediction is a challenging task that involves analyzing of the tumor microenvironment within Whole Slide Image (WSI). Previous methods cannot effectively capture the intricate interaction features among instances within the local area of WSI. Moreover, existing methods for cancer survival prediction based on WSI often fail to provide better clinically meaningful predictions. To overcome these challenges, we propose a Sparse Context-aware Multiple Instance Learning (SCMIL) framework for predicting cancer survival probability distributions. SCMIL innovatively segments patches into various clusters based on their morphological features and spatial location information, subsequently leveraging sparse self-attention to discern the relationships between these patches with a context-aware perspective. Considering many patches are irrelevant to the task, we introduce a learnable patch filtering module called SoftFilter, which ensures that only interactions between task-relevant patches are considered. To enhance the clinical relevance of our prediction, we propose a register-based mixture density network to forecast the survival probability distribution for individual patients. We evaluate SCMIL on two public WSI datasets from the The Cancer Genome Atlas (TCGA) specifically focusing on lung adenocarcinom (LUAD) and kidney renal clear cell carcinoma (KIRC). Our experimental results indicate that SCMIL outperforms current state-of-the-art methods for survival prediction, offering more clinically meaningful and interpretable outcomes. Our code is accessible at https://github.com/yang-ze-kang/SCMIL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2991_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/yang-ze-kang/SCMIL

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Yan_SCMIL_MICCAI2024,
        author = { Yang, Zekang and Liu, Hong and Wang, Xiangdong},
        title = { { SCMIL: Sparse Context-aware Multiple Instance Learning for Predicting Cancer Survival Probability Distribution in Whole Slide Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes SCMIL, a MIL framework for survival prediction. 1) The authors propose the SoftFilter to distinguish between task-relevant features and task-irrelevant features. 2) The authors propose a clustering based sparse self-attention to aggregate features. 3) The authors also propose a Prompt-based Mixture Density Network PromptMDN that considers the mean and std within the cancer patient cohort.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method is novel and seem useful. It also achieves pretty good performance.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The clarity of this paper needs to be improved. Many parts in the method section are very difficult to understand. To be specific:

    1. The authors mention that in K-Means, “the similarity between patches is obtained by a weighted sum of the cosine similarity of … features and … positions”. In K-Means, we need to define the distance/similarity between patches to conduce clustering. Does the authors define this distance as dist = a * dist_cos + (1-a) * dist_pos and w_1 = dist_cos, w_2 = dist_pos? This part is very difficult to understand. I read it more than 5 times and finally come to this guess.
    2. The author should be clear if L_i is a collection of features of the i-th cluster or a cluster level feature? Also, if L_i is a collection, why not aggregate features within a cluster into a cluster level feature after the self attention. It might put less burden on AMIL.
    3. The authors mentioned that they utilize MHSA to refine clustered features, and call it sparse self-attention approach later. Why is it sparse? Sparse self-attention usually means that when generating qkv attention, some mechanism is enforced to force the majority of the attention values to be zero. I do not see such mechanism in the paper. If the authors means that MHSA is conducted on clusters and thus it reduces the number of tokens, it is not “sparse self-attention”. If the authors means that MHSA is conducted within each clusters and no interaction among clusters, it is just a shared MHSA for all clusters not a “sparse self-attention”.
    4. w_1 and w_2 appear in section 2.2 and there is a w_i(Feat′) in section 2.3, are they the same w?
    5. In section 2.1, how is the importance scores IS optimized? I do not see IS appear in the method section after section 2.1. If it is not utilized in the final loss function, how it is optimized?
    6. In the results section, the authors mention “CSSA” which does not appear before, my best guess is that it refers to “SCSA”.
    7. In Ablation Analysis, the authors ablate their method by removing the CSSA (“SCSA”) module. Does it mean that the task relevant features are fed directly to AMIL or there is a MHSA like TransMIL? If it is the first one, the improvement may come from the MHSA, not the clustering operation. Thus it is not an reasonable ablation. If it is the latter one, it is fine.
    8. Why are P_m and P_v called prompts? They seem not visual prompts nor language prompt. Visual prompts are fed as the input to the backbone network and the backbone network is frozen. They are parameters called prompts because like NLP tasks, for different vision tasks, we do not need to change/fine-tune the backbone but only change the prompts. There is no frozen backbone in PromptMDN, and thus P_m and P_v look more like pure parameters.
    9. The author model the survival prediction as a Gaussian mixture model. Is there any intuitive explanation of this process and why P_m and P_v models the other patients within the cancer patient cohort?
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    As what I mentioned in the weaknesses section.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method in this paper is interesting and novel, but the clarity of the method really need to be improved. If the authors can address this issue, this paper should be accepted.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors clarify my questions in the rebuttal and I think this paper is good now.



Review #2

  • Please describe the contribution of the paper

    In this paper, the authors have proposed a method designed to effectively identify instances related to survival risks from the histopathological images, the experimental results conducted on two cancer cohorts validates its superority in comparison with the existing studies.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well written and the experimental results are convincing in comparison with the exsitng studies. 2, The interpretability of the proposed method is good, which can identify specific ROIs that are associated with the survival of human cancers.
    2. The proposed method is easy to implement on other clinical task based on WSIs.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1, The “3 steps algorithm” i.e., SoftFilter, SCSA, PromptMDN proposed in the paper are not described in good way. Section II looks like a patch made of individual nice ideas, but it lacks a glue that makes it fit together as an algorithm that makes sense.

    1. It is not clear if the variance of the parameter values introduced in Section 3.1 e.g., C, Thres will affect the prognosis performence of the proposed method.
    2. The pairwise t-test should be applied to statistically demonstrate the supeority of the proposed method.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The authors should adjust the method section to make it more logcal and readable. Also, the aurhors need to state the necessity and reasons of adopting the individual blocks applied in this stduy.
    2. The authors should disscuss the effects of the variance of different parameters used in this study.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This experimental results of this study are both with high accuracy and good interpretability.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    (1) This paper proposes a Sparse Context-aware Multiple Instance Learning (SCMIL) framework for predicting cancer survival probability distributions. (2) The framework comprises three key modules: SoftFilter, which effectively filters out task-irrelevant patches; SCSA, which discerns relationships between patches from a context-aware perspective; and Prompt MDN, tasked with forecasting the survival probability distribution. (3) This method has been evaluated on two public WSI datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The prediction of survival probability distribution is highly novel, diverging from the focus of previous papers, which typically concentrate on predicting survival periods. (2) The visualizations effectively demonstrate the efficacy of this method, notably the heatmap of IS in the top right of Figure 3, and the survival probability distribution curves for two patients depicted in Figure 4.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) It would be beneficial to provide a more detailed explanation of how the survival probability distribution could enhance clinical relevance. (2) In the framework figure, the clustering process could be better illustrated within the SCSA Module of Figure 1. Additionally, the representation of instance features and clustering centers using identical pink rectangular icons in the SCSA Module is confusing. (3) Further clarification is needed regarding why H_low should be concatenated with all clusters. (4) The rationale behind setting the cluster size C as 64 should be justified. Additionally, exploring the visualization of different types of patches within different clusters could be valuable.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) Add the ablation study of SCMIL w/o H_low and SCMIL w/o H_high, exactly only using task-relevant features or task-irrelevant features for prediction. (2) Revise Figure 1 to incorporate additional details about the clustering process.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    (1) The paper holds important clinical significance. (2) The method in this paper is very clear, and its performance is excellent.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors explained how they chose the hyperparameters and they will add more ablation experiments.




Author Feedback

We thank the reviewers for your constructive and valuable feedback. We appreciate your recognition of the novelty and interpretability of our method. Especially, thanks to R1 and R4 for accepting the paper directly.

To R1 and R4 The hyperparameters of our model were determined by pre-experiments on the LUAD dataset and used on all datasets. Due to the length of the paper, we have not listed all of them in the submitted paper. Thanks for suggestions of ablation experiments, which we will consider including in the journal paper.

To R1 1.SoftFilter and SCSA are interdependent modules.With numerous patches as input, using SCSA alone leads to overfitfing on noise patches. Adding the SoftFilter guides the SCSA to learn interactions among key areas, as evidenced by the ablation experiment in Section 3.2. A description of the relationship between each module will be added to the first paragraph of Section 2 to help understanding. 2.By pairwise t-test, do you mean Kaplan-Meier (KM) analysis? Class-based KM analysis is not specific to the patient, but rather an entire population. We adopt a more suitable time-dependent evaluation metric, TDC, to evaluate the model’s capacity to predict individual survival probability distributions.[1]

To R3 1.In employing K-Means, the distance is defined as dist=w_1(1-feat)+w_2pos. Where feat denotes the consine similarity between patch features and pos represents the normalized Euclidean distance between patches. 2.L_i is a collection of features of the i-th cluster.  Attempts to “aggregate features …”, resulted in information loss and degraded model’s performance. Moreover, the primary computational overhead and memory usage are not within the AMIL and these cost are justified. 3.Sparse self-attention optimizes the self-attention mechanism by confining interaction to local positions. Different sparse self-attention methods mainly focus on how to choose the locations for interaction. The SCSA we proposed determines the interaction locations through clustering. [1] has similar idea and also is considered as a form of sparse self-attention. 4.w_i(Feat’) refers to the weight of the i-th component in MDN. To reduce confusion, we will rename it with a new variable a(Feat’). 5.The optimization of importance scores is described in section 2.1:”the features of each patch are element-wise … without requiring patch-level supervision.”, resembling Gated MLP. 6.“CSSA” is a typographical error of “SCSA”. Thanks for pointing it out. 7.Table 1 shows the “the task relevant features are fed directly to AMIL” method. We also have attempted NystromAttention in TransMIL, but it performed worse than out proposed method. 8.P_m and P_v represent the mean vector and standard deviation vector input to the Gaussian mixture model and are learnable parameters independent of the individual data. Considering the potential for misunderstanding, we have renamed them to “registers”, following [3]. This implies that P_m and P_v register the information related to cancer patient cohort. 9.A distinct Gaussian mixture model is used to model each patient’s survival probability distribution. Across different patients, they share the mean vector and standard deviation vector, while weight vectors are derived from patients’ pathological images. Consequently, P_m and P_v contain the global information of the group.

To R4 Thank you for your suggestions regarding our Figure 1. We will carefully consider them and make the necessary enhancements.

[1]Haider H,et al. Effective ways to build and evaluate individual survival distributions[J]. Journal of Machine Learning Research,2020,21(85):1-63. [2]Wang S,et al. Cluster-former: Clustering-based sparse transformer for question answering[C]//Findings of the Association for Computational Linguistics:ACL-IJCNLP 2021.2021:3958-3968. [3]Darcet,et al. “Vision transformers need registers.” arXiv preprint arXiv:2309.16588 (2023).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top