Abstract

Recent cancer survival prediction approaches have made great strides in analyzing H&E-stained gigapixel whole-slide images. However, methods targeting the immunohistochemistry (IHC) modality remain largely unexplored. We remedy this methodological gap and propose IHCSurv, a new framework that leverages IHC-specific priors to improve downstream survival prediction. We use these priors to guide our model to the most prognostic tissue regions and simultaneously enrich local features. To address drawbacks in recent approaches related to limited spatial context and cross-regional relation modeling, we propose a spatially-constrained spectral clustering algorithm that preserves spatial context alongside an efficient tissue region encoder that facilitates information transfer across tissue regions both within and between images. We evaluate our framework on a multi-stain IHC dataset of pancreatic cancer patients, where IHCSurv markedly outperforms existing state-of-the-art survival prediction methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0495_paper.pdf

SharedIt Link: https://rdcu.be/dY6ix

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72083-3_20

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0495_supp.pdf

Link to the Code Repository

https://github.com/charzharr/miccai24-ihcsurv

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zha_IHCSurv_MICCAI2024,
        author = { Zhang, Yejia and Chao, Hanqing and Qiu, Zhongwei and Liu, Wenbin and Shen, Yixuan and Sapkota, Nishchal and Gu, Pengfei and Chen, Danny Z. and Lu, Le and Yan, Ke and Jin, Dakai and Bian, Yun and Jiang, Hui},
        title = { { IHCSurv: Effective Immunohistochemistry Priors for Cancer Survival Analysis in Gigapixel Multi-stain Whole Slide Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {211 -- 221}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This study develops a survival prediction framework, IHCSurv, specifically designed for multi-stain IHC analysis. The framework incorporates a hierarchical architecture with cross-region attention to effectively capture prognostic features within and across different tissue regions. It also leverages IHC-specific priors through cell categorization to enhance survival prediction accuracy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The focus of this study on cancer survival analysis is quite intriguing and could benefit the research community. The approach of combining clustering with cell encoding across multiple IHC stains is particularly interesting. However, it also raises several concerns, as detailed in the recommended revisions.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The proposed method for studying cancer survival analysis is crucial to the medical community and represents a promising area of research. However, the authors need to thoroughly discuss why clustering techniques are necessary, particularly when the embedded features could be directly processed by the ViT model. The rationale behind this methodology remains unclear. Moreover, the current study lacks sufficient experimental evidence to substantiate the significance of the proposed model.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Adding code is recommended, as it will help the community follow this work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Several improvements are suggested for this study as follows: 1.The reviewer has concerns about the use of clustering for the embedded features. The authors should provide a detailed discussion on the necessity of employing such techniques, especially when the embedded features could be directly processed by the ViT model by adding positional encoding. Additionally, while the inclusion of cell counts appears promising, the ablation study suggests that they do not significantly contribute to the model’s performance. The authors should clarify these aspects to enhance the understanding of their methodological choices.

    1. Why did the authors choose to use an ImageNet pretrained model as mentioned on Page 4, where “1024-dimensional patch embeddings by global average pooling the feature outputs from the third stage of an ImageNet-pretrained ResNet50” are utilized to extract features from IHC stained images? Would not a pathology domain-trained model, specifically an IHC pre-trained model, be more effective? The authors should provide insights on why “Features pre-trained on pathology datasets were also evaluated but ImageNet-based features consistently outperformed them.” An ablation study would be beneficial to support this claim.
    2. The reviewer is concerned about the use of multiple IHC stains without including H&E. Multiple IHC staining is generally more expensive and time-consuming than H&E and less available. It might be more advantageous to train some features from multi-IHC and apply them to H&E for efficient cancer survival analysis.
    3. What do the authors mean by “l1” and “l2” mentioned in Figure 1 on Page 3? A clear description would aid reader comprehension.
    4. The authors should specify and explain the color coding used between and after the two tissue-level encoder layers (ft), as well as the logits color coding in Figure 1 on Page 3.
    5. The clustering method of embedded patches needs a clearer explanation. 7.It would be useful for the authors to include results from a single stain (CD4) in Table 1 for better comparison. The current data does not show a significant difference in performance when utilizing both stains, a similar observation to other methods’ performance for multi-stain versus single-stain.
    6. Table 3 does not show significant improvements between rows 12 and 13 when cell embeddings are included. The authors should clarify and discuss these findings.
    7. The authors should compare results from experiments passing the IHC stains without any clustering and then with the proposed model to demonstrate the impact of utilizing clustering in the current ViT model.
    8. Are the CD8 and CD4 used in the training from the same tissue?
    9. Minor comment: Adding the ‘Method’ column just once in Table 1 would be sufficient, as the authors are comparing the same models for both single and multiple stain performance.

    The reviewer hopes that these modifications will help the authors improve their current study.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed methodology requires further explanation. Additionally, the current study needs more experiments, evaluations, and results to substantiate the proposed model.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The study is intriguing, though there are still concerns regarding the limitations in the experiments.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a method to predict patient survival based on multiple immunohistochemistry (IHC) pathology slides. Patch-level embeddings are generated using an tissue encoder and a spatially-constrained spectral clustering method is used to cluster based on both patch coordinates embedding features. These patch features are augmented with cell-count, stain information to provide IHC specific features. A ViT with self-attention blocks is used to exchange information across patches and across multiple stains and the output is used for predicting discrete survival.

    The framework is evaluated on a multi-stain pancreatic cancer dataset and is compared against multiple baselines on both multi and single stain. Stat-of-art performance is shown on this internal benchmark.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper proposes a spatial+semantic cluster-based approach with a ViT to fuse information from multiple patches across multiple stains in pathology WSIs. This is augmented with IHC-specific features. Even though certain parts of the approach are not completely novel, the combination is novel and relevant.
    2. The spatial + patch semantic + IHC specific features is well motivated and the ablations show the benefits of such an approach.
    3. Since the dataset is not public, the choice of baselines is important. The baselines implemented and evaluated by the paper cover attention based MIL models, graph neural networks, and hierarchical transformers, which is a good mix of approaches to evaluate against.
    4. The approach yields strong results on the pancreatic cancer survival dataset.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The use of coordinates for clustering in order to inform the clusters of the spatial features of the patch is not new and has been done before in various contexts including pathology (eg [1]).
    2. The paper is very sparse in terms of its experiments. Only a single, non-public dataset is used. This makes it hard to compare to other methods. Previous work shows results on TCGA which is public and well benchmarked. This is a big limitation for the paper in its current form.

    [1] Dwivedi, Chaitanya, et al. “Multi stain graph fusion for multimodal integration in pathology.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The authors are advised to add more results on TCGA and compare with c-index from previous work. Adding more results on held-out sources will further improve the paper quality.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents an interesting and new approach to incorporate spatial, semantic, and IHC-specific information from multiple IHC WSIs. The formulation, experimental details, and survival prediction performance shows promising results and ablations validate the modeling choices. However, only a single dataset is used to show performance which hugely limits the impact of the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I would like to increase my rating to a weak accept after the rebuttal. The paper proposes an interesting way to predict survival from multi-stain IHC images. The paper in its current form is limited by showing results on a single, non-public dataset, which is the only one that exists. Other novel parts of the paper like the clustering can be shown to be helpful with H&E datasets like TCGA. However, the existing results are done on a well characterized and large dataset, with good baselines.



Review #3

  • Please describe the contribution of the paper

    The authors introduce a method for survival prediction on an in-house IHC stained whole-slide image dataset of pancreatic cancer patients. They use two different IHC stainings and aggregate information from both stainings by pre-selecting regions of high interest by a spectral clustering algorithm. Regions are enriched by easy to obtain additional features like cell counts and processed through a custom network. The network is more efficient than a standard ViT as some layers are shared between regions and stainings. The performance is superior to current state-of-the art survival prediction on their dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of spectral clustering in this context is novel. It might be interesting not only in survival anlysis but also for IHC slide classification. The paper is mostly well understandable and well structured. The experiments and ablation are reasonable.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • All experiments are on an in-house dataset only, and as the authors state there is no such IHC survival dataset publicly available, which makes it hard to build on their work or reproduce it. -Standard deviations are missing, which in my experience can be quite high in survival prediction. Could be obtained by cross-val or different seeds at least. -Experiments could be more detailed, e.g. why was only CD8 considered evaluated as a single staining and not CD4?

    Minor weaknesses: -the spacing in figure 1 is not good (top part is way smaller) and all used variable/function names should be explained (C1, l1, zk) as its not intuitively clear what is meant. also the very last step to logits is not clear to me in this figure. -overuse of introducing variable and non-intuitive names. some are introduced and never used again e.g. d_p. Where possible it would be great to have more descriptive variable names which makes remembering them easier.

    found typos: “performnace” in conclusions

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Code will be publicly released, dataset is private and as nothing is mentioned i assume it will stay private.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Thanks for your work, its an interesting read. It is generally well structured, some things should be motivated, e.g. why chose only CD8 as the single staining? How were the hyperparameters N_k=100/400 chosen? Also please provide some kind of standard deviation of your result, especially in survival in my experience the results are often not that stable.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall decent paper that can be followed and has a novel idea with an application. For me this outweighs the weaknesses like limited evaluation and limited reproducibility

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I still think its a generally interesting paper although with limited impact. Reviewers adressed my concern and will add the measured standard deviation, which is comparatively low.




Author Feedback

We are grateful to the reviewers for their feedback and approvals of our method’s novelty, strong performance, and potential benefit to the pathology community. Our aim was to propose/plant a seed work to open the door for survival analysis in the immunohistochemistry (IHC) modality; we’re glad the reviewers found our results promising and interesting. We also appreciate the suggestions to improve clarity and will incorporate them into revisions.

Evaluation [R1,R3,R4] Single Dataset - The most salient critique was that our evaluation involved a single IHC dataset. We understand this is suboptimal. However, when no publicly available IHC datasets exist, we worked hard to curate an extensive dataset for evaluation and future public release (after legal approvals). Our data targets pancreatic cancer and has over 10TB of images from a cohort of ~1000 patients, each with multiple IHC images. We ensured diversity in ages, genders, TNM staging, tumor differentiations, and pancreatic parts. The final dataset was curated after filtering for quality and represents one of the first largest IHC survival datasets to be. [R3] Public Comparisons – We thank R3 for the suggestion to adopt TCGA benchmarks. Our main reasons for not doing so are: 1) our method targets the IHC modality while TCGA only contain H&E images, and 2) some proposed components (e.g., cell count feature enrichment) use priors exclusive to IHC images which may prevent direct fair comparisons. [R4] Significance of Results – We thank R4 for suggesting stds in our main evaluations and will add them: 0.0117 for CD8 & 0.0142 for CD4+CD8. All of our CI scores in Tab. 1 are statistically significant (using a paired t-test between our CI and the 2nd best) with p values far below 0.05. We also observed score variance, however, with stronger regularization and tuning, we were able to keep std to under 1.5% CI across different random seeds. We also highlight that stratification studies are more stable and our method is the only one to achieve statistically significant separation. [R1,R4] CD8 for Single Stain – We used CD8 for two main reasons: 1) CD8 was a stronger baseline to compare multi-stain against (CD4: 0.5716 CI), and 2) CD8 has more evidence in literature to be a potent prognostic predictor given its role in quantifying tumor infiltrating lymphocytes while CD4’s prognostic value originates from its interaction with other immune-targeted stains.

[R1] Method Motivations - Our method for patch feature extraction via clustering is commonly used (e.g., AttnMISL) to select representative patches while reducing the number of patch inputs. Although patch embeddings contain spatial information, we are still tasked with making patch processing tractable, maintaining spatial context around patches, and introducing useful implicit biases (region-level processing) in the face of sparse learning signals. Our initial experiments ran without clustering (random patch subsampling and selecting the most cancerous patches), both were worse than k-means and substantially worse than our proposed clustering scheme. Further, we agree that IHC-specific features via pretraining may improve performance, but our paper focused on a survival framework where patch features may be extracted in various ways. The use of ImageNet features is common in literature (e.g., CLAM). We also tried using HIPT features but was significantly worse (0.5566 CI for CD8), likely due to the lack of IHC stain colors in H&E images. [R3] Clustering Novelty – Other pathology works have utilized coordinates directly to inform clusters. However, we use them to formulate similarity as affinity scores between patches via spectral clustering which allows for more fine-grained control over semantic and distance weighing. [R4] N_k Choice – We selected N_k=400 patches per cluster based on a target region size of 2.5x2.5 mm which was deemed appropriate by clinicians to balance cell-level and tissue-level information (stated in 2.1).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    NA

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NA



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors addressed most of the reviewers’ concerns in their rebuttal.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors addressed most of the reviewers’ concerns in their rebuttal.



back to top