Abstract

The Gleason Grade Group is the gold standard for diagnosing and prognosticating prostate cancer. Existing multiple instance learning (MIL) methods for Grade Group classification have overlooked domain-specific knowledge that the Grade Group is collaboratively determined by different Gleason Patterns, limiting their performance. In this study, we propose DSPA-MIL, a Dual Selective Gleason Pattern-Aware MIL model for patient-level Grade Group prediction. Our approach incorporates a dual selective instance aggregation strategy, combining selective aggregator tokens and patch-level Gleason pattern expert concept-guided aggregation. Furthermore, to effectively utilize patient-level Grade Group expert concepts, we introduce a knowledge-distillation-based framework for training and inference, enabling accurate Grade Group score prediction. Experimental results on five datasets comprising 10,809 whole slide images (WSIs) and 1,133 tissue microarray (TMA) images demonstrate the superiority of our method, which outperforms state-of-the-art (SOTA) MIL approaches. The code is available at https://github.com/AlexNmSED/DSPA-MIL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0636_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/AlexNmSED/DSPA-MIL.

Link to the Dataset(s)

N/A

BibTex

@InProceedings{HaoXin_Dual_MICCAI2025,
        author = { Hao, Xinyu and Xu, Hongming and Zhang, Qibin and Xu, Qi and Polonen, Ilkka and Cong, Fengyu},
        title = { { Dual Selective Gleason Pattern-Aware Multiple Instance Learning for Grade Group Prediction in Histopathology Images } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {188 -- 198}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces dual selective MIL model where we have patch/instance aggregation in 3 Gleason Patterns first. This aggregation is complemented by expert guided knowledge distilation framework for final gleason grade prediction.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    I like the idea of agrregating pattern (3,4,5) specfic information using Transformer block. Also these token are then backed by expert guided tokens.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    I found the explanation of Teacher and Student Gate a bit hard to follow.

    I would like to suggested improving the descripion and figure 1

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Knowledge distillaton section is a bit hard to understand. Authors should refine the text and figure for better understanding.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposes a novel Gleason-pattern-based feature aggregation method to better mimic the pathology procedure and improve performance. The use of expert concepts to identify similarities between features and address the challenges that arise in vision-language models is particularly interesting.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The idea of focusing on Gleason patterns and incorporating expert concepts is novel and interesting, as it addresses the core requirement for accurate Gleason grade estimation. This approach closely mimics the pathologist’s grading procedure.
    2. Extensive experiments are conducted on four publicly available held-out datasets, with results reported in terms of AUC and kappa scores.
    3. The method is compared against the latest state-of-the-art MIL models, including both attention-based and graph-based approaches.
    4. The paper is clear and well-organized.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Limitations and future work are not discussed. It would be nice to discuss these in the conclusion.
    2. The final grade group output of the model, especially during inference, is not clearly discussed. Is it based on O_k?
    3. ISUP Grade Group 2 is Gleason pattern 3+4, and Grade Group 3 is Gleason pattern 4+3. In both cases we have similar O: O3=1, O4=1, and O5=0. How the authors have handled this?
    4. Performance over Vancouver TMA data is low compared to other papers: MS-RGCN AUC performance was 85 on Vancouver and also on the Zurich dataset that was heldout. Please report the MS-RGCN performance on the Vancouver dataset in table 2.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Other weaknesses and question:

    1. Number of parameters and inference time has not been discussed. I suggest adding a column with this information to table 2.
    2. It would be interesting to see the similarity matrix and see if it is meaningful between image and patch-level pattern concept.
    3. It would be useful to readers to see the grade group (GG) distribution in the datasets. I believe it is possible to add GG0 to GG5 to table 1.
    4. Radboud dataset has more than 5000 WSIs. Why authors has not used all of the data and used 4506? For instance, MS-RGCN paper has used 5057 samples.
    5. Is this model interpretable to some extend or is it a full black box? Based on what presented in the paper, it seems to be a black box. If not, it would be nice to have figures and discussion on this.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper idea is novel and the experiments are solid. I am leaning towards better scores if the concerns are addressed in the rebuttal, especially the grade group 2 and 3 problem, why some of the data in Radboud dataset has not been used, and a discussion on lower performance on the Vancouver TMA data.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors present an innovative patient-level Grade Group prediction method that employs a dual selective aggregation approach, incorporating learnable tokens and expert-concept guidance. They introduce a knowledge distillation framework designed to enhance model predictions by effectively utilizing specialized Gleason Pattern features alongside expert concepts. Their extensive experimental evaluations across multiple datasets indicate that their proposed DSPA-MIL method outperforms current leading strategies in Grade Group prediction. Furthermore, the validity of their approach is reinforced through an ablation study, demonstrating its effectiveness.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper presents several notable strengths:

    Novel Methodology: The authors propose a unique patient-level Grade Group prediction method that employs a dual selective aggregation approach. This framework is interesting because it integrates learnable tokens with expert-concept guidance, allowing for more nuanced predictions that leverage both machine learning and domain expertise.

    Extensive Experimental Validation: The extensive experiments conducted across multiple datasets provide robust evidence of the model’s effectiveness. This is a critical aspect of any study, as it demonstrates the applicability and reliability of the proposed method in diverse clinical contexts.

    Ablation Study: The inclusion of an ablation study is particularly commendable as it effectively illustrates the contributions of different components of the model. This strengthens the argument for the validity of their approach and provides insights into how the introduced part of the model contributes to overall performance.

    Clarity and Transparency: The authors provide a clear and detailed explanation of their methods, which enhances understanding and facilitates replication. Furthermore, their commitment to sharing the code underlines the importance of reproducibility in research, allowing others to validate and build upon their findings.

    These strengths collectively contribute to the overall significance of the work and its potential impact on the field.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    While the paper presents compelling findings, there are some notable weaknesses:

    Lack of Discussion on Model Limitations: The authors do not adequately address the limitations of their proposed model in the discussion or conclusion sections. A critical examination of potential weaknesses, such as challenges related to generalizability or scalability of the model in real-world clinical settings, would provide valuable insights for the scientific community. Discussing limitations would not only enhance the transparency of the study but also guide future research directions.

    Future Directions: The paper lacks a clear outline of future research avenues based on the findings. Suggestions for how the model could be improved, potential modifications for different clinical scenarios, or ways to integrate it with existing clinical workflows would be beneficial. Providing these insights would help other researchers understand how to build upon this work and advance the field.

    Including a discussion on these aspects would not only strengthen the overall impact of the paper but also serve as a resource for ongoing and future research endeavors in this area.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend accepting the paper for several key reasons:

    Innovative Methodology: The authors present a novel patient-level Grade Group prediction method utilizing a dual selective aggregation approach. This innovation contributes significantly to the existing body of literature and addresses an important gap in the field.

    Robust Experimental Validation: The extensive experiments conducted across multiple datasets demonstrate the effectiveness of the proposed model. The results are compelling and provide strong evidence to support the claims made by the authors.

    Clear Presentation and Transparency: The paper is well-organized, and the methods are described in sufficient detail to allow for replicability. The authors’ commitment to sharing code enhances the transparency of their work, which is crucial for advancing scientific knowledge.

    Comprehensive Evaluation: The inclusion of an ablation study effectively highlights the contributions of different model components. This thorough evaluation further substantiates the validity of the authors’ approach.

    Despite some minor weaknesses, such as the lack of discussion on model limitations and future directions, these issues do not detract significantly from the overall quality and impact of the work. The strengths of the paper far outweigh these concerns, making it a valuable contribution to the field.

    In summary, the combination of innovation, robust validation, and clarity in presentation justifies my recommendation for acceptance.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

Thank you for recognizing the value of our work. We are sincerely grateful for your constructive feedback and thoughtful suggestions. We will revise the manuscript accordingly to reflect these insights. Below, we provide detailed responses to the reviewers’ concerns.

1.Knowledge-Distillation-based Grade Group Prediction (1)Feature Preparation The model obtains three sets of Gleason Pattern features (P3, P4, P5) through Selective Aggregation and Concept-Guided Aggregation. (2)Teacher-Student Distillation Framework Teacher Branch: uses real expert concept embeddings to generate gating weights, which are then applied to the Gleason pattern features (P3, P4, P5) for global feature aggregation and subsequent Grade Group prediction. Student Branch: generates pseudo-text embeddings from P3, P4, P5 without relying on real expert concepts, and learns to produce gating weights for Grade Group prediction. (3)Knowledge Distillation Alignment Knowledge distillation is guided by three components: Gating Weight Distillation, Output Distribution Distillation, and Text Embedding Alignment Loss, which collectively transfer knowledge from the teacher branch to the student branch. Additionally, a pattern presence classification task (O3, O4, O5) is incorporated to enhance the representation of the k-th predictive pattern, where k ranges from 3 to 5. (4)Final Inference Process Only the learned features P3, P4, P5 are used in the inference stage. The student branch generates pseudo-text embeddings and computes gating weights to aggregate these three predictive patterns. The final Grade Group prediction is based solely on these aggregated features. Thus, the pattern presence classification task is only used for supervision during training and has no dependency during inference.

2.Grade Group 2 vs Grade Group 3 The pattern presence classification task is introduced as a constraint to enhance the representation of the k-th predictive pattern; however, it is not involved in the final Grade Group prediction. For patients with Grade Group 2 or 3, we explicitly define the distribution of primary and secondary Gleason patterns within the expert concept embeddings. This prior knowledge guides the model to distinguish between Grade Group 2 and 3 patients during prediction.

3.MS-RGCN Performance on The Vancouver Our use of the Vancouver dataset differs from that of MS-RGCN. We used Vancouver only as a test set, while MS-RGCN treated it as a training set. Therefore, we did not report MS-RGCN’s performance on the Vancouver dataset.

4.Radboud Dataset We filtered the Radboud dataset by removing samples with ink artifacts, which resulted in a difference in the number of samples compared to MS-RGCN.

5.Future Work In the current dataset, some samples have Gleason Pattern mask annotations. In future work, we plan to incorporate these annotations into our framework to provide more explicit supervision and potentially enhance model performance. Additionally, we will explore frameworks with better interpretability in the future.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top