Abstract

In digital pathology, the multiple instance learning (MIL) strategy is widely used in the weakly supervised histopathology whole slide image (WSI) classification task where giga-pixel WSIs are only labeled at the slide level. However, existing attention-based MIL approaches often overlook contextual information and intrinsic spatial relationships between neighboring tissue tiles, while graph-based MIL frameworks have limited power to recognize the long-range dependencies. In this paper, we introduce the integrative graph-transformer framework that simultaneously captures the context-aware relational features and global WSI representations through a novel Graph Transformer Integration (GTI) block. Specifically, each GTI block consists of a Graph Convolutional Network (GCN) layer modeling neighboring relations at the local instance level and an efficient global attention model capturing comprehensive global information from extensive feature embeddings. Extensive experiments on three publicly available WSI datasets: TCGA-NSCLC, TCGA-RCC and BRIGHT, demonstrate the superiority of our approach over current state-of-the-art MIL methods, achieving an improvement of 1.0% to 2.6% in accuracy and 0.7%-1.6% in AUROC.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2165_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2165_supp.pdf

Link to the Code Repository

https://github.com/StonyBrookDB/igt-wsi

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Shi_Integrative_MICCAI2024,
        author = { Shi, Zhan and Zhang, Jingwei and Kong, Jun and Wang, Fusheng},
        title = { { Integrative Graph-Transformer Framework for Histopathology Whole Slide Image Representation and Classification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper, a comprehensive graph-transformer framework is introduced, which captures both context-aware relational features and global WSI representations through a novel Graph Transformer Integration (GTI) block. Specifically, each GTI block consists of a Graph Convolutional Network (GCN) layer and an efficient global attention model. The GCN layer models the neighboring relations at the local instance level, while the global attention model captures comprehensive global information from extensive feature embeddings.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A novel Integrative Graph-Transformer (IGT) framework was developed for WSI representation and classification. The core architecture of the IGT framework consists of a sequence of graph transformer integration blocks, where each block includes a GCN layer for encoding spatial relationships among adjacent instances and a global attention module for capturing global WSI representations. The authors demonstrated the effectiveness of their method on three publicly available WSI datasets: TCGA-NSCLC, TCGA-RCC, and BRIGHT.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I’m confused by the proposed Integrative Graph-Transformer (IGT), and I would appreciate a detailed explanation.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The author did not disclose the source code, and I have doubts about reproduction.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I’m confused by the proposed Integrative Graph-Transformer (IGT), and I would appreciate a detailed explanation.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Detailed method introduction and relatively complete experiments.

    Reproducible code

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposes an approach to merge Graph learning with MIL learning to leverage the benefits of both for WSI classification. They present a GTI block with performs one layer of GCN update followed by a global attention update to learn both local features and global features. They experiment with 3 datasets and ablate their architecture components to demonstrate the efficacy of their approach

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The approach is very practical and it combines the benefits of GNNs with MIL. They tackle the challenge of learning long range dependencies in GNNs by adding a global attention layer on top of GCN which is quite intuitive and practically usable.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper is well written and the approach makes sense. The main criticism is around choice of benchmark datasets and comparison with SOTA methods. The results are compelling but it is not clear to me how they arrived at the numbers. They mention TransMIL results on TCGA-RCC as 90.2 (acc) and 97.7 (auroc) but the original paper reports 0.9466 (acc) 0.9882 and (auroc) making them the SOTA rather than the authors. Further expanded in Q 10.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    They do not mention anything about releasing code and the details in the paper are not sufficient to reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • What is the size of N used in practical applications?
    • Each feature in 2.1 should be each patch, right?
    • In selecting the patches, do you also include white background?
    • Is there any thresholding on the distances in kNN? are 8 closest neighbors always included regardless of their distance?
    • As mentioned in the weaknesses, can you motivate the choice of the dataset with the need of modeling long range dependencies? Why do those benefit from global attention, mentioning that would help the narrative
    • How are the results computed for the datasets and are the experimentation settings followed as necessary to ensure that the results line up with the original papers?
    • Do you experiment with any graph pooling approaches which is meant to curate information globally but in a different manner?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Practically motivated application paper, clear writing warrants an accept but the choice of benchmarks and discrepancies with respect to the baseline numbers weakens the paper a little bit but based on the rebuttal this can be revisited

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I thank the authors for addressing my comments and I maintain my score



Review #3

  • Please describe the contribution of the paper

    The authors propose a method to include both local and global information in WSI classification task in a multi-instance learning (MIL) framework. The local spatial information is incorporated using a GCN layer, while the global information is obtained using an attention layer. The authors validate their results on three publicly available datasets. They obtain superior performance to the existing baselines.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The authors propose to incorporate local and global attention for WSI classification. For local spatial relations, GNN is used, while attention is used to handle the oversquashing and oversmoothing issue. Oversmoothing and oversquashing are well-known issues with GNN and thus, the authors propose an interesting approach to mitigate it.

    2. The authors validate their results on multiple datasets and compare them against multiple baselines. Superior performance shows substance to the proposed method.

    3. The paper is well-written and easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The GNN used is an important component. It is similar to a modified resnet connection for updating the node features. Perhaps ablation experiments should have been conducted on its choice. Why GenConv and not any other convolution method?

    2. Section 2 description of GCN uses the update equation with edge features. It gives an impression that there are edge features in the current setup. If I understood the method clearly, the adjacency matrix is binary and no weighting/features are used for the edges.

    3. GCN is used to refer to the work [1]. Section 2 should use GNN to indicate the authors are talking about general graph neural networks. In fact, during the explanation of the transformer architecture in the next subsection, the authors do use GNN. The sudden flip in terminology is a bit confusing.

    Minor Points

    1. Figure 1 description -> “Adjacency matrix” instead of adjacent matrix.

    2. Figure 1 -> Is the “Add” icon left by mistake or on purpose?

    3. Section 3.2 -> “feature vectors are downscaled to 256”. Please explain how. Linear layer, or something different?

    [1] Semi-supervised classification with Graph Convolutional Networks (https://arxiv.org/abs/1609.02907)

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Although well-written and obtaining impressive results, there are some modifications that can be made to the manuscript. As mentioned in the weakness section, “Why GenConv and not any other convolution method?” seems like an arbitrary choice. Aggregation of local information is an important factor but, there is a lack of discussion on this aspect. Perhaps the authors can shed more light on the same during rebuttal.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea to incorportate local and global information is very meaningful. The authors address the problem of oversquashing and oversmoothing for GNN while also taking into account the space complexity of transformers. I believe this is an elegant solution to the above short comings.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Based on the author’s rebuttal and the experiments, the idea seems to work empirically. The rebuttal answers a few of my concerns but, not all. Specifically, the absence of ablation experiments about the kind of GNN used in the current manuscript is disappointing.

    The transformer layer acts on all the nodes (this is similar to a fully connected graph in action), producing enriched features with global information. These “global-aware” features are then passed to a GNN layer. Hence, the GNN layer already has the global information; thus, one would expect that different kinds of graph convolution layers would perform similarly. However, without such an ablation study, we can not answer this question.

    In view of the above, I will keep my rating unchanged and have a slight inclination towards accepting the paper.




Author Feedback

We thank the reviewers for their constructive criticism and positive evaluation. We will make the code available after acceptance.

[R1] Motivation on choice of benchmark datasets. The selected three datasets contain gigapixel-scale WSIs with significant heterogeneity in tissue types and structures. The BRIGHT dataset, with its six breast tumor subtypes, highlights this morphological diversity. Pathologists consider both local cellular patterns and broader tissue architecture, as well as correlations between different areas when diagnosing. Thus, modeling long-range dependencies is crucial for WSIs, as critical diagnostic information may be distributed across various regions of the slide.

[R1] Clarification of experimental results. We ran these baseline models on the same extracted features of WSIs from ResNet50, strictly following the experimental settings outlined in the original papers. Our experimental setup and data usage for the RCC dataset were similar to [28], using 940 WSIs with a different splitting ratio. In [28], TransMIL achieved 87.6 ACC and 97.2 AUC. In our experiments, we achieved comparable performance with 90.2 ACC and 97.7 AUC. The discrepancies with the original TransMIL results mainly stem from differences in the dataset and data splits. Specifically, TransMIL used a different number of slides (884 WSIs) and employed cross-validation, whereas we used a randomly selected test set and averaged the results over three runs. These variations in data quantity and evaluation methods significantly impact the results.

[R1] Experiment with any graph pooling approaches. We tried to integrate the SAGPool graph pooling method into our proposed IGT framework using global pooling architecture or the hierarchical pooling architecture. However, we did not observe any performance improvements.

[R3] Reason for adopting GenConv as the GNN component. The choice of GNN component is indeed flexible. We compared GenConv with GCN and GIN when developing the IGT framework. GenConv achieved the best performance with acceptable complexity, largely due to its aggregation function design, which approximately performs attention pooling of instance-level features within local graph neighborhoods, enhancing the model’s effectiveness.

[R3] Clarification on edge features and terminology. We thank the reviewer for pointing out the potential sources of confusion. In our current setup, the adjacency matrix is binary, with no weighting or features used for the edges. We will correct the description and equation in Section 2 that might have implied otherwise. Additionally, we will revise the manuscript to consistently use “GNN” when referring to general graph neural networks throughout the section to ensure clarity and coherence.

Detailed Queries. [R1] Size of ‘N’. ‘N’ varies across WSIs. On average, it is 10,720 for TCGA-NSCLC, 12,397 for TCGA-RCC, and 7,157 for BRIGHT. [R1] Clarification on Features. In Section 2.1, “each feature” should indeed refer to “each patch.” [R1] White Background: We removed white background patches by applying a saturation threshold of < 15 [R1] Thresholding on Distances in kNN: We do not set a distance threshold and always include the 8 closest neighbors, following the same pipeline as Patch-GCN[6]. [R3] Figure 1: We will correct “adjacent matrix” to “adjacency matrix” and remove the “Add” icon for clarity. [R3] Section 3.2: We used a linear layer to downscale feature vectors.

[R4] Detailed explanation of the IGT framework: The IGT framework enhances WSI classification by integrating GCNs with efficient Transformer-based attention mechanisms to capture both local and global features. The novel Graph-Transformer Integration (GTI) block processes node features through GCN and Global Attention layers in parallel, integrating their outputs to form comprehensive feature representations. This effectively models spatial relationships and long-range dependencies, addressing the limitations of existing GNN methods.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top