Abstract

Multiple instance learning (MIL) has shown significant promise in histopathology whole slide image (WSI) analysis for cancer diagnosis and prognosis. However, the inherent spatial heterogeneity of WSIs presents critical challenges, as morphologically similar tissue types are often dispersed across distant anatomical regions. Conventional MIL methods struggle to model these scattered tissue distributions and capture cross-regional spatial interactions effectively. To address these limitations, we propose a novel Multiple instance learning framework with Context-Aware Clustering (MiCo), designed to enhance cross-regional intra-tissue correlations and strengthen inter-tissue semantic associations in WSIs. MiCo begins by clustering instances to distill discriminative morphological patterns, with cluster centroids serving as semantic anchors. To enhance cross-regional intra-tissue correlations, MiCo employs a Cluster Route module, which dynamically links instances of the same tissue type across distant regions via feature similarity. These semantic anchors act as contextual hubs, propagating semantic relationships to refine instance-level representations. To eliminate semantic fragmentation and strengthen inter-tissue semantic associations, MiCo integrates a Cluster Reducer module, which consolidates redundant anchors while enhancing information exchange between distinct semantic groups. Extensive experiments on two challenging tasks across nine large-scale public cancer datasets demonstrate the effectiveness of MiCo, showcasing its superiority over state-of-the-art methods. The code is available at https://github.com/junjianli106/MiCo.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2436_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/junjianli106/MiCo

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiJun_MiCo_MICCAI2025,
        author = { Li, Junjian and Liu, Jin and Kuang, Hulin and Yue, Hailin and He, Mengshen and Wang, Jianxin},
        title = { { MiCo: Multiple Instance Learning with Context-Aware Clustering for Whole Slide Image Analysis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {378 -- 388}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose MiCo, a novel Multiple Instance Learning framework incorporating Context-Aware Clustering. MiCo introduces a Cluster Route module that enhances intra-tissue semantic associations by aggregating and propagating information from spatially dispersed patches of the same tissue type, thereby refining instance-level representations. In parallel, a Cluster Reducer module is employed to mitigate semantic fragmentation and reinforce inter-tissue semantic relationships by consolidating semantically redundant anchors while preserving pathological heterogeneity.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Extensive experiments demonstrate the effectiveness of MiCo, showing its superior performance compared to state-of-the-art methods in both survival prediction and cancer subtyping.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The current description and diagram of the proposed method are insufficient for readers to fully comprehend the framework. Critical implementation details are missing, which significantly limits the ability to evaluate the method’s effectiveness and reproducibility. For example, key hyperparameters are not reported—such as the output dimensionality of the Context-Aware Clustering module, or the value of M. Additionally, if M is set too large, directly pooling M + K/2 tokens may introduce instability and potentially degrade classification performance. This needs further justification and experimental data support.

    Moreover, the statement on page 4—“Morphologically similar tissue types are often dispersed across distant anatomical regions, making it challenging to capture their contextual relationships and semantic consistency”—is questionable. This claim lacks supporting evidence from medical literature. In fact, results from existing patch-level foundational models such as UNI, GigaPath, and CHIEF demonstrate that spatially adjacent patches are typically clustered into the same group, suggesting strong spatial coherence in morphological patterns. Clarification and validation of this assumption are needed.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The current description and diagram of the proposed method are insufficient for readers to fully comprehend the framework. Critical implementation details are missing, which significantly limits the ability to evaluate the method’s effectiveness and reproducibility. For example, key hyperparameters are not reported—such as the output dimensionality of the Context-Aware Clustering module, or the value of M. Additionally, if M is set too large, directly pooling M + K/2 tokens may introduce instability and potentially degrade classification performance. This needs further justification and experimental data support.

    Moreover, the statement on page 4—“Morphologically similar tissue types are often dispersed across distant anatomical regions, making it challenging to capture their contextual relationships and semantic consistency”—is questionable. This claim lacks supporting evidence from medical literature. In fact, results from existing patch-level foundational models such as UNI, GigaPath, and CHIEF demonstrate that spatially adjacent patches are typically clustered into the same group, suggesting strong spatial coherence in morphological patterns. Clarification and validation of this assumption are needed.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    The author’s rebuttal did not resolve my doubts, so I maintain my initial judgment: reject.

    The main reason is that the patch embeddings obtained by directly using the existing patch-level foundation models (such as UNI, virchow, gigapath, titan, etc.) can be well clustered based on k-means. I have doubts about the necessity of the method proposed by the author to iterate the clustering results multiple times. Can the clustering results be close to the attention heatmap?



Review #2

  • Please describe the contribution of the paper

    The main contribution of this paper is the introduction of MiCo, a novel Multiple Instance Learning framework designed specifically to overcome the challenges of spatial heterogeneity in WSI analysis. MiCo utilizes multi-stage Context-Aware Clustering with two key components: the Cluster Route module, which enhances cross-regional correlations within the same tissue type by dynamically linking and propagating information between distant instances via semantic anchors, and the Cluster Reducer module, which consolidates redundant anchors and strengthens semantic associations between different tissue types.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper clearly articulates a significant and practical challenge in WSI analysis: the difficulty of modeling spatially dispersed but semantically related tissue patterns and their interactions, which is often overlooked by standard MIL approaches.

    2. The proposed MiCo framework, particularly the combination of the Cluster Route and Cluster Reducer modules within a multi-stage architecture, presents a novel strategy to explicitly model and leverage cross-regional dependencies.

    3. The paper generally presents the method clearly, aided by a helpful overview diagram.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The proposed method involves multi-stage clustering, similarity computations, aggregation, and anchor reduction, potentially adding significant computational overhead compared to some baselines. A brief discussion or quantitative comparison regarding training time or computational complexity would be beneficial for assessing the practical trade-offs.

    2. In Fig 2, to directly support the core claim of handling spatial heterogeneity, it would be compelling to show specific examples where visibly distant patches within the WSI are correctly assigned to the same cluster/anchor by CluRoute, perhaps highlighting these patches on the WSI thumbnail alongside their shared cluster color.

    3. This article conducted cancer subtyping experiments on TCGA-BRCA and NSCLC. However, since these two tasks are relatively simple, the differences between various methods are minimal. Therefore, I suggest the authors consider attempting more challenging tasks, such as predicting HER2 molecular status and gene mutations, to better evaluate the performance differences among different methods.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method proposed in this article is simple yet quite effective, and the experimental section also demonstrates the validity of the approach.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    My main concerns have been addressed, so I maintain my original decision.



Review #3

  • Please describe the contribution of the paper

    My recommendation is based on the overall clarity of the problem definition and the details provided about the algorithm. However, Section 2.1 lacks a more detailed symbolic representation and formal definition, which significantly affected my evaluation.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work presented a novel formulation of Multiple Instance Learning (MIL). By clustering instance features into several pseudo-classes with learnable semantic anchors, MiCo effectively introduces compact, region-level semantics into each instance representation. The use of an MLP for cluster-based dimensionality reduction is conceptually interesting.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    (1) There may be an error in Equation (2), as the output A^_l does not appear to be a binary matrix. (2) The description of the cluster number reduction in the Cluster Reducer module lacks clarity. Specifically, it is unclear whether each layer reduces the number of clusters by half from the previous layer, or whether such reduction only occurs in the first layer. If the cluster number is halved at each layer, it may result in overly coarse representations in later stages, potentially mixing information from distinct regions. On the other hand, if the reduction occurs only in the first layer, the depiction of the Cluster Reduce in Fig. 1 may need to be revised for consistency.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I justify my recommendation. The clarity of the method and experimental settings are my main concerns.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank all reviewers for their constructive comments.We respond below. [R1.Q1.1].Implementation Details:Outputs of Context-Aware Clustering are instance features (Md) and refined anchor features (K’d, K’ is the reduced anchor number).M (number of instances in a WSI) is determined by preprocessing (cropping non-overlapping patches),varying per WSI,not a fixed hyperparameter. We will provide more implementation details and refine Fig.1 in the final version.Code will be released upon acceptance.

[R1.Q1.2].Potential Instability:MiCo effectively processes numerous instances. After combining the refined instance features(M×d) with the anchor features(K’×d), a subsequent attention pooling mechanism assigns minimal attention weights to task-irrelevant instances or anchors,which is a common strategy in various frameworks(e.g. HVTSurv(AAAI23),RRTMIL(CVPR24)).This allows the model to focus on the most task-relevant information, thus preventing performance degradation even with numerous instances. This stability and effectiveness are supported by MiCo’s consistent high performance across 9 diverse cancer datasets.

[R1.Q2].Statement:Apologies for the imprecise phrase.Our intention was to highlight the challenge posed by morphologically similar tissue types or patterns that are discontinuously distributed across different areas within a single WSI[9,11]. We agree that models like UNI correctly show strong local spatial coherence. However, this local coherence does not contradict that similar regions can be spatially separated across the WSI. This phenomenon of within-WSI dispersion , including cancerous tissue, is a well observation in diagnostic pathology and medical literature. It arises from biological realities such as intra tumor heterogeneity;for example, cancerous tissue can manifest as small, spatially separated foci or diffusely distributed tumor cells, or the varied distribution patterns of immune cell infiltrates within the tumor microenvironment. MiCo addresses this within-WSI spatial heterogeneity, focusing on the discontinuous distribution of similar instances. Its Cluster Route explicitly links instances of the same tissue type across regions within the WSI to enhance cross-regional intra-tissue correlations. We will update the text and references in the final version.

[R2.Q1].Computational Efficiency:In designing MiCo,we carefully balanced performance and computational cost through two key optimizations:1. Our learnable anchor mechanism avoids computationally expensive pairwise interactions among all instances.2.Lightweight MLPs are used for efficient anchor dimensionality reduction. As a result, MiCo (3.41 GFLOPs) is significantly more efficient than methods like TransMIL (7.35 GFLOPs).

[R2.Q2].Visualization:We will update the Fig.2 to show how MiCo effectively groups distant patches with similar morphological features.

[R2.Q3].Evaluation: We agree that tasks like HER2 or IDH prediction offer further validation opportunities and will incorporate this into the future work.

[R3.Q1].Clarification of Eq.2:Regarding Eq.2,it uses a Straight-Through Estimator to calculates the instance-to-anchor assignment matrix A^_l,A^_l is a binary matrix. The numerical value of the term A_l-sg(A_l) is 0,thus not altering the binary value determined by onehot(argmax(A_l)).

[R3.Q2].Explanation of the Cluster Reducer:The Cluster Reducer module halves cluster numbers at each stage of the MiCo.This progressive reduction aims to intelligently refine and consolidate semantic information,rather than causing coarseness or indiscriminately mixing distinct regional information. Fig. 2 demonstrates that cluster assignments of semantically related regions are progressively merged across stages, preventing overly coarse representations in later phases.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top