Abstract

Existing WSI analysis methods lie on the consensus that histopathological characteristics of tumors are significant guidance for cancer diagnostics. Particularly, as the evolution of cancers is a continuous process, the correlations and differences across various stages, anatomical locations and patients should be taken into account. However, recent research mainly focuses on the inner-contextual information in a single WSI, ignoring the correlations between slides. To verify whether introducing the slide inter-correlations can bring improvements to WSI representation learning, we propose a generic WSI analysis pipeline SlideGCD that considers the existing multi-instance learning (MIL) methods as the backbone and forge the WSI classification task as a node classification problem. More specifically, SlideGCD declares a node buffer that stores previous slide embeddings for subsequent extensive slide-based graph construction and conducts graph learning to explore the inter-correlations implied in the slide-based graph. Moreover, we frame the MIL classifier and graph learning into two parallel workflows and deploy the knowledge distillation to transfer the differentiable information to the graph neural network. The consistent performance boosting, brought by SlideGCD, of four previous state-of-the-art MIL methods is observed on two TCGA benchmark datasets.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2479_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2479_supp.pdf

Link to the Code Repository

https://github.com/HFUT-miaLab/SlideGCD

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Shu_SlideGCD_MICCAI2024,
        author = { Shu, Tong and Shi, Jun and Sun, Dongdong and Jiang, Zhiguo and Zheng, Yushan},
        title = { { SlideGCD: Slide-based Graph Collaborative Training with Knowledge Distillation for Whole Slide Image Classification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15004},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors introduce SlideGCD, a novel pipeline for histopathology Whole Slide Image (WSI) analysis, designed to facilitate WSI classification through a unique approach of node representation learning and classification within a dynamically constructed graph

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work presents a novel approach to node classification using WSI embeddings as graph nodes, offering an intriguing perspective in histopathological image analysis.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Lack of Practical Motivation: The introduction fails to clearly articulate the practical utility of correlating slides with similar signals. Understanding the significance of this observation and how it can be leveraged is crucial for justifying the research’s relevance. Insufficient Coverage of Related Work: The discussion of related work, particularly in the introduction, is lacking. Notable works like DASMIL[1], which combines graphs and knowledge distillation at the patch level, are omitted, which could provide valuable context and justify the chosen methodology. Ambiguity in Methodology: It’s unclear whether the Multiple Instance Learning (MIL) backbone is frozen, pre-trained, or trained concurrently with the rest of the model. Additionally, the explanation of the learning rate’s notation and the specifics of how data storage and gradient propagation work are inadequate or missing. Concerns About Data Handling and Model Functionality: Questions arise regarding the storage approach for patches or entire slides and how the model functions in datasets with low signal density, such as Camelyon. The mechanism for gradient propagation to the aggregation component (AGG) and the handling of various data elements within the formulae are also unclear. Contextual Information vs. Correlation Information: The frequent reference to “contextual information” might be misleading, as the term is typically associated with models processing patch-level MIL. A more accurate term might be “correlation information” to reflect the relationship between the buffer and the node to be classified. Errors and Omissions in Experimental Details: The feature extractor is incorrectly referred to as “PILP” instead of “PLIP,” and there’s a lack of clarity regarding the number of runs for average accuracy, the explanation of metrics, and the specifics of the buffer’s size in relation to the dataset. Knowledge Distillation Contribution: The fixed setting of knowledge distillation’s contribution to 1 raises questions about its impact on the model’s performance. Variation in this parameter could provide insights into the utility of knowledge distillation in the proposed model. Unexplained Observations in Ablation Studies: Observations regarding potential saturations in ablation studies lack accompanying explanations, which could elucidate underlying model dynamics or limitations. Computational Complexity Concerns: There are worries that the proposed solution might introduce significant computational complexity, but the extent of this impact is not quantified or discussed.

    [1] DAS-MIL: Distilling Across Scales for MIL Classification of Histological WSIs. In: MICCAI 2023

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Majors: Lack of Practical Motivation, Insufficient Coverage of Related Work, Method not clear, Insufficient Coverage of Related Work, Computational Complexity Concerns, Knowledge Distillation Contribution not clear (I would have expected an experiment demonstrate its value)

    Minors: -typos as PILP

    • Computational Complexity Concerns

    Further details in the weaknesses section

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work presents a novel approach to node classification using WSI embeddings as graph nodes, offering an intriguing perspective in histopathological image analysis. However, the paper is marred by several significant issues, including unclear methodological explanations, insufficient comparative analysis, and a lack of practical motivation. Additionally, the absence of code sharing and the incomplete coverage of experimental details significantly hinder the work’s reproducibility and the ability to validate its claims.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors’ response clarified my doubts. I really appreciated how the authors improved the motivation of the work and recommend integrating it into the final version of the manuscript. I would have preferred to see a greater number of ablation studies but it is also true that, with the limitations of the conference, it is difficult to include everything in such a small space. So I think I can change my decision to accepted with the hope of seeing an extension soon in the future.



Review #2

  • Please describe the contribution of the paper

    The paper proposes an approach to exploit contextual information by means of a slide-based graph, with the final goal of improving MIL models for WSI analysis. More specifically, the author(s) proposes SlideGCD, an approach that leverage a node buffer to store previously seen slide embeddings that are used for subsequent slide-based graph construction. Additionally, a knowledge distillation mechanism is introduced to align the outputs of the proposed SlideGCD with the base MIL model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well written and generally easy to follow.
    • The experiments carried out demonstrate that the proposed solution improves the performance of four existing MIL approaches.
    • The proposed solution aims at increasing the dataset’s contextual information when analyzing a specific WSI targeting an open (and recently explored by many authors) research point.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper misses important details. For example, it is not clear how the graph is built during inference. Is the PLIP feature extractor frozen? I guess so, but this is not mentioned in the paper text, nor in the figure summarizing the model (Fig. 2). How is the buffer used during inference? How are the elements to be removed chosen during training?
    • Although not mandatory, publishing the code will give the reader a chance to reproduce the experiments. There are a lot of details involved in the implementation that cannot be condensed in 8-pages paper, making the experiments hard to be replicated.
    • The authors focused the experiments on the TCGA-BRC and TCGA-NSCLC datasets while ignoring a commonly employed dataset in the evaluation of WSI analysis algorithms: the CAMELYON-16. Also, two of the considered MIL approaches, TransMIL [23] and DTFDMIL [25], originally evaluated their performance on such a dataset. Is there any practical reason to exclude such CAMELYON-16 from the evaluation?
    • The numbers declared in the papers are far from the real performance of considered MIL approaches. For example, as reported in the original TransMIL paper [23], the model achieves an AUC of 96 and an accuracy of 88.35 on the TCGA-NSCLC dataset. Such metrics are far above the baseline declare in the submitted paper (94.82 AUC, 85.82 Acc) and are also above the achievements obtained by SlideGCD (95.59 AUC , 86.82 Acc). I understand that the reason behind such a discrepancy may be related to the different features extractor employed, but this makes the results arguable and hard to understand whether the real contribution of SlideGCD is effective or not.
    • The references should be improved. For example, latest graph-based approaches have not been considered in the review analysis [1, 2, 3*]. I perfectly understand that 8-pages are very short, but the paper has still rooms to make the analysis broader.
    • No evaluation on the additional computational requirements required by the SlideGCD is reported in the paper or in the supplementary material.
    • No comparison with the state-of-the-art context aware (graph-based) solutions is provided in the paper.
    • The author(s) mentioned that “there are a few warmup epochs that only optimize the backbone for pre-convergence. […] the formal training will start with a smaller learning […]” how many are those epochs? What is the initial learning rate? These are the details that makes the difference on the final figure and by not providing them in the paper and by not providing the source code make it hard to impossible reproducing the experiments.
    • Minor typos should be fixed: a blank space is missing before “(CNN)” and “(GNN)” (Introduction); the comma in Eq. 6 should be a full stop; PILP -> PLIP (Section 3.1); [1] https://doi.org/10.1609/aaai.v36i1.19976 [2] https://doi.org/10.1109/TMI.2023.3337549 [3*] https://doi.org/10.1007/978-3-031-16434-7_4
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please refer to Section 6, weaknesses.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, while the paper shows promise in addressing a relevant research point and demonstrates some improvement over existing methods, the significant shortcomings highlighted by the review raise concerns about the paper’s overall quality and contribution. The major identified issues are: limited dataset and comparison; discrepancies in reported results.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision

    I really appreciate the rebuttal that cover many of the raised concerns. However, the main issue I foreseen in the paper is still present: experimental results do not align with the previous publications, raising doubts on the quality of carried out experiments and on the effectiveness of the proposed solution. Unfortunately, the (promised) code was not released.

    “We applied CV in the experiments whereas the original TransMIL paper did not.” This is not true, TransMIL does apply cross validation (stated in Section 4, Experiments and Results, at the end of the 5th paragraph of the original publication).



Review #3

  • Please describe the contribution of the paper

    This paper introduces a novel approach to WSI analysis by constructing graph relationships among multiple WSIs. A framework based on a memory bank and knowledge distillation is proposed, consistently achieving improvements in WSI classification across various MIL-based methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    novel idea to construct graph for many WSIs consistent improvements for various MIL-based methods

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The significance and medical background of slide correlation should be further discussed, noting the relationships and differences in acquisition times and anatomical locations. The exploration of slide correlations via graph learning, particularly as cancer evolution is a continuous process, suggests potential connections between WSIs of tumors at different stages.

    A more detailed description of the slide-based graph is necessary. It should specify how many WSIs are considered, which WSIs are used to build the graph, the distance metrics used for clustering with KNN, and other such details.

    The optimization process from aggregation to slide-based graph needs to include precise details on the gradient update rules.

    Additional experiments related to the slide-graph, such as methods of graph construction and size, should be considered to further validate the approach.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    release the code if possible

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    When discussing the relationships among multiple WSIs, it is essential to consider the patch-slide-patient perspective, exploring the connections and differences between WSIs. Relevant studies include:

    • “Hierarchical Discriminative Learning Improves Visual Representations of Biomedical Microscopy,” CVPR 2023.
    • “Hvtsurv: Hierarchical Vision Transformer for Patient-Level Survival Prediction from Whole Slide Image,” AAAI 2023.
    • “Cancer Survival Prediction from Whole Slide Images with Self-supervised Learning and Slide Consistency,” TMI 2022.

    Detailed introductions should be added for the graphical representations. For example, Figure 2 should include a more detailed description from input to output.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    novel idea and good performance

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    this paper has good motivation and can be accpeted if code released.




Author Feedback

The rebuttal addresses the following four concerns.

  1. Unclear Motivation The motivation of this work lies in the consensus that histopathological characteristics of tumors, including the tendency of tissue invasion, metastasis, growth pattern, and etc., are significant guidance for cancer diagnostics and therapies. Particularly, as the evolution of cancers is a continuous process, the correlations and differences across various stages, anatomical locations and patients should be taken into account. However, recent research mainly focuses on the inner-contextual information in a single WSI, e.g. multi-scale information (H^2 MIL, DASMIL), heterogeneous structure (HEAT) and semantic invariance (ReMix). Few have considered the correlations at a larger scope (HiDisc, HVTSurv) yet are still limited to patient-level. Based on this point, we incorporate the above consideration with graph neural networks, which show promising flexibility and interpretability in multiple fields, into a generic framework for improving the learning of slide representation and the performance of the integrated patch-based MIL methods.
  2. Undetailed Method –After the patches are tiled, the frozen PLIP is applied to extract patch embeddings. In the 10 epochs warm-up phase, only the MIL Backbone T() and its classifier ClsMIL() are updating for pre-convergence, the graph-based branch doesn’t participate in parameter update for now. The Node Buffer with a length of 3072 is continuously collecting preceding slide embeddings generated by T() in the manner of first-in-first-out, like MoCo. As the formal training phase begins, the AGG module constructs and updates the slide-based graph iteratively, where a node denotes a WSI and each is connected to the closest k nodes (measured by cosine similarity between node embeddings), and the subsequent GCN starts to work. –The MIL backbone is trainable throughout the training, ensuring it generates more discriminative slide representations. –During inference, all parameters and the Node Buffer are frozen. When a WSI is inputted, 1) its initial embedding will be made with T(), 2) the AGG module will insert it into the slide-based graph by connecting it with its k-nearest buffer nodes, 3) the trained GCN will make message passing to refine its embeddings for final classification. –The learning rate l follows the baseline setting in warm-up stage and empirically declines to 1e-4 during formal training.
  3. Insufficient Experiment and Confused Results –As to discrepancies in reported results (Reviewer #1), the main factor should be the cross-validation (CV). We applied CV in the experiments whereas the original TransMIL paper did not. Our results are close to the original DTFD-MIL paper considering that our total training epochs (100) are half theirs (200). –Gradient propagation of AGG: the nodes retrieved from the buffer are considered as static data that have no gradients, and gradient propagation only comes from the nodes from the current mini-batch. –The extra computations mainly come from several additional linear layers and two graph conv layers. Concretely, our method has GFLOPs of 404.37 with a mini-batch consisting of 64 WSI with 5000 patches, which is only 1.01% higher than the 398.93 GFLOPs of its baseline – TransMIL. This increase is minimal. –The absence of Camelyon16 (Reviewer #1) is mainly because its size is much smaller than the TCGA datasets we used. –The information density of datasets and the missed ablation about knowledge distillation and graph construction strategy will be considered part of our future work.
  4. Other concerns on reproducibility –We will open-source the code to cover the technical and practical details. –We will improve the figures to clarify misunderstandings and add missed references. –A table will be provided in Supplementary Materials to cover the experimental settings including but not limited to learning rate, runs of training, the gradient update rules, etc.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors made a good rebuttal to address the concerns about the motivation of building a slide-level graph. The AC also acknowledge the interesting and importance of the idea. As pointed out by reviewers, the authors should add such motivation and discuss the experimental comparison in the final version.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors made a good rebuttal to address the concerns about the motivation of building a slide-level graph. The AC also acknowledge the interesting and importance of the idea. As pointed out by reviewers, the authors should add such motivation and discuss the experimental comparison in the final version.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top