Abstract

Whole Slide Images (WSIs) are crucial for cancer diagnosis in digital pathology. WSI classification typically relys on Multiple Instance Learning (MIL). Existing MIL methods use attention mechanisms to highlight key instances but struggle to capture instance interactions. Although Transformers, State Space Models (SSMs), and Graph Neural Networks (GNNs) have made progress in solving this problem, they still face two main issues: (1) insufficient guidance from class-related information in modeling instance relationships, and (2) inadequate interaction between slides at different magnifications. To address these issues, we propose Knowledge-guided Multi-scale Graph Mamba (KMG-Mamba), which incorporates a Knowledge-guided Graph Representation (KGR) method for class-related guidance and Cross-scale Knowledge Interaction Mamba (CKIM) to facilitate effective cross-magnification information exchange. Experimental results on three public datasets show KMG-Mamba outperforms current MIL methods in WSI classification.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0569_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{DuaMin_Knowledgeguided_MICCAI2025,
        author = { Duan, Minghong and Yang, Zhiwei and Ma, Yingfan and Wang, Manning and Song, Zhijian},
        title = { { Knowledge-guided Multi-scale Graph Mamba for Whole Slide Image Classification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15971},
        month = {September},
        page = {438 -- 448}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a knowledge-guided graph representation (KGR) method for class-related guidance and additionally designs a cross-scale knowledge interaction mamba (CKIM) to aggregate information across scales in WSI classification. The method is compared against other methods on three publicly available datasets and shows superior performance.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of the paper: 1)The proposed prototype-guided graph aggregation module is interesting. It integrates class-specific knowledge to enhance graph representations for WSI classification. 2)The paper is written in a clear and easy-to-follow manner, making it easy to understand the proposed method. 3)Experiments showed that the proposed method outperformed SOTA methods (graph, mamba, multi-scale, etc) in three datasets.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The main weaknesses of the paper: 1)As the core contribution of this paper, the motivation and innovation of KGR have certain limitations. Although the nodes with the highest degree can reflect the importance of these nodes, they cannot fully capture the prototype information in the graph. Therefore, KGR is essentially a re-weighting learning method based on the most important nodes, rather than a prototype-guided learning approach. In addition, according to the ablation results in Table 2, after removing KGR, the performance of TCGA-BRCA and TCGA-NSCLC does not show a significant decrease. Thus, the novelty and necessity of KGR need to be further clarified. 2)The motivation behind preferring a CKIM over cross-attention based methods is not fully discussed. The ablation results show the CKIM is better, but it’s unclear why cross-mamba facilitates information exchange. 3)Some details still remain to be clarified. For example, how to convert the aggregated features into classification logits? Does the removal of CKIM mean the simultaneous removal of loss function L^bag? According to the ablation results in Table 2, after removing CKIM, a suboptimal result is achieved on the CAMELYON16 dataset. Whether this indicates that CKIM may have a negative impact in some cases is worthy of further discussion.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed prototype-guided graph aggregation module is interesting. However, the novelty and necessity of KGR need to be further clarified.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a new Mamba-based MIL method, KMG-Mamba, for histopathology imaging to address two main issues: (1) insufficient guidance from class-related information when modeling instance relationships, and (2) inadequate interaction between slides at different magnifications. To incorporate class-related information, the proposed method adopts a virtual two-stage design (stage 1: knowledge-guided graph representation - KGR, stage 2: Cross-scale knowledge Interaction Mamba (CKIM)), trained end-to-end, where the output (prototype node) from the first stage is optimized to predict the class, thereby introducing class-related information for interaction in the second stage. To enable adequate interaction across magnifications, the method uses cross-scale interaction inspired by cross-attention, implemented using Mamba. Across multiple datasets, the proposed method achieves state-of-the-art performance while being significantly more efficient in compute requirements.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The design of both stages, KGR and CKIM, is novel and directly grounded in addressing the identified issues rather than being based on arbitrary choices.

    2) The use of Mamba for sequence contextualization addresses the quadratic computational complexity of transformers, resulting in KMG-Mamba, a model that requires significantly less GPU memory.

    3) KMG-Mamba achieves consistently high performance across different encoders and datasets. Although the improvement over a few baselines is limited, the robustness across all combinations is impressive.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1) Why is a smoothed support vector machine (SVM) loss function used for prototype prediction optimization? Could the authors provide insights into this less common choice?

    2) While the attention map visualization on CAMELYON16 is provided, it is shown only for a few selected WSIs. Since annotations of cancerous regions are already available for this dataset, the authors could compare the localization FROC of the proposed method with baselines (similar to DSMIL [1]).

    3) Not sure about the necessity of using head, base, and tail embeddings for each node. Could this graph construction framework be replaced entirely with just the base embedding? What would be the implications for performance?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I found the architecture design combining the two stages of using class-related information for aggregation and multi-scale aggregation novel. Furthermore, the proposed method is thoroughly validated.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I am satisfied with the authors responses about SVM based loss and graph nodes. I would have preferred to see the FROC values on Camelyon dataset, but I understand it’s out of scope for rebuttal phase at Miccai.



Review #3

  • Please describe the contribution of the paper

    The main contribution of this paper is the introduction of Knowledge-guided Multi-scale Graph Mamba (KMG-Mamba), a novel approach for WSI classification in digital pathology. KMG-Mamba addresses two key issues in existing MIL methods -the lack of sufficient class-related guidance in modeling instance relationships. -the inadequate interaction between slides at different magnifications. The paper proposes a Knowledge-guided Graph Representation method for better class-related guidance and a Cross-scale Knowledge Interaction Mamba to enable effective information exchange across different magnifications.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper’s main strength is its novel Knowledge-guided Multi-scale Graph Mamba (KMG-Mamba) approach for WSI classification, which addresses key challenges in MIL. The method introduces Knowledge-guided Graph Representation for class-related guidance and Cross-scale Knowledge Interaction Mamba for effective cross-magnification interaction. The approach outperforms current MIL methods on three public datasets, demonstrating both innovation and strong empirical results in digital pathology.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The major weakness of the paper lies in the methodology section. Despite integrating Mamba and Graph models, the paper does not address several important issues raised in recent research on MambaOut, particularly regarding the modeling of long sequences, as discussed in MambaOut’s work. This gap in addressing the latest developments makes the combination of Mamba and Graph less credible, limiting the robustness and relevance of the proposed approach. -Yu W, Wang X. Mambaout: Do we really need mamba for vision?[J]. arXiv preprint arXiv:2405.07992, 2024.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My score is based on two key factors. First, the paper provides a thorough and well-executed comparison in the experimental section, which effectively demonstrates the advantages of the proposed method. Second, the integration of Mamba and Graph models in the methodology is an innovative approach that addresses key challenges in WSI classification. While the method is not entirely groundbreaking, the combination of these techniques offers a solid contribution to the field. The paper’s clear experimental validation and novel methodological approach were key factors in my recommendation.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

(Q1) Novelty and effectiveness of KGR. (R1, AC) KGR aims to integrate class-specific knowledge to enhance graph aggregation. By using an instance-level classifier to predict prototype logits and constructing a loss function with category labels as constraints, the training process drives the graph construction to concentrate category information at the highest-degree nodes. The base embedding of these nodes serves as the category center (i.e., the prototype), which guides the graph aggregation. This approach enables class-aware message passing and aims to make the graph aggregation focus more on features beneficial to classification, not merely re-weighting nodes or capturing prototype information. Therefore, the proposed KGR is innovative and different from a re-weighting learning method. Ablation of the KGR showed a consistent performance decline in two metrics across all datasets (particularly -1.4% ACC on TCGA-NSCLC, -3.9% AUC and -1.8% ACC on CAMELYON16). Although dataset-specific variations exist, the consistent performance decline demonstrates KGR’s necessity.

(Q2) Motivation of CKIM. (R1, AC) CKIM outperforms cross-attention due to its input context-aware characteristics and linear complexity, making it more efficient for long instance sequences. As shown in Equation (3), the system matrix C decodes information, and thus we promote cross-scale information exchange via the interaction of C_HR and C_LR, as shown in Equation (9). Unlike cross-attention, where parameters remain fixed during inference, CKIM dynamically adjusts based on input, enabling better information exchange.

(Q3) More details. (R1) The aggregated features are transformed into classification logits by a linear classifier (consistent with MambaMIL). Removing CKIM means that the F_LR and F_HR from KGR are directly concatenated as inputs to the Aggregator & Classifier, but this does not imply the removal of L_bag. CKIM’s importance varies across datasets, but ablation studies have demonstrated that its removal leads to a performance decline on all three datasets, confirming that CKIM does not introduce negative impacts.

(Q4) Why SVM loss. (R2, AC) The smoothed SVM loss is more effective than cross-entropy loss at preventing overfitting in instance-level classifier training by introducing a margin that penalizes the differences between the prediction logits of the true label class and other classes. This strategy was also used in CLAM (Nature Biomed. 2021).

(Q5) FROC comparison. (R2, AC) We appreciate the advice and will add FROC results on the complete test set in future work.

(Q6) Clarification on head, base, tail embedding. (R2, AC) The use of head, base, and tail embeddings ensures directed relationships and contributions between nodes. Using only the base embedding would lead to a lack of directional constraints in the instance interactions and message passing, thus limiting KGR’s ability to capture complex contextual relationships and propagate more practical information. Experiments have shown that our method outperforms the undirected graph based methods Patch-GCN and MG-Trans.

(Q7) Issues about Mambaout. (R3, AC) MambaOut’s long-sequence modeling hypothesis, conducted on natural images (where classification tasks use 224×224-sized images from ImageNet), does not apply to WSIs. The small-sized natural images generate fewer tokens (196 tokens with patch size of 16×16), making their classification task not exhibit the characteristics of processing long sequences, and thus Mamba is not applicable. In contrast, the gigapixel-scale WSIs typically produce tens of thousands of instances, presenting long-sequence characteristics that Mamba can effectively model. Mamba’s applicability to WSI classification has been demonstrated in recent work such as MambaMIL (MICCAI 2024). Our KMG-Mamba incorporates the CKIM module to model instance correlations across magnifications, utilizing multi-scale WSI information to further improve classification performance.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    This paper got mix scores. I would prefer to give a chance to the authors to address the major concerns proposed from the reviewers, especially experimental result analysis such as effectiveness of the proposed essential module, why SVM, comparison in FROC, clarification on head, base, tail embedding, and the robustness problem raised in mambaout.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top