List of Papers Browse by Subject Areas Author List
Abstract
Cancer diagnosis has greatly benefited from the integration of whole-slide images (WSIs) with multiple instance learning (MIL), enabling high-resolution analysis of tissue morphology. Graph-based MIL (GNN-MIL) approaches have emerged as powerful solutions for capturing contextual information in WSIs, thereby improving diagnostic accuracy. However, WSIs require significant computational and infrastructural resources, limiting accessibility in resource-constrained settings. Conventional light microscopes offer a cost-effective alternative, but applying GNN-MIL to such data is challenging due to extensive redundant images and missing spatial coordinates, which hinder contextual learning. To address these issues, we introduce MicroMIL, the first weakly-supervised MIL framework specifically designed for images acquired from conventional light microscopes. MicroMIL leverages a representative image extractor (RIE) that employs deep cluster embedding (DCE) and hard Gumbel-Softmax to dynamically reduce redundancy and select representative images. These images serve as graph nodes, with edges computed via cosine similarity, eliminating the need for spatial coordinates while preserving contextual information. Extensive experiments on a real-world colon cancer dataset and the BreakHis dataset demonstrate that MicroMIL achieves state-of-the-art performance, improving both diagnostic accuracy and robustness to redundancy. The code is available at https://github.com/kimjongwoo-cell/MicroMIL
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0707_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/kimjongwoo-cell/MicroMIL
Link to the Dataset(s)
BreakHis dataset: https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis
BibTex
@InProceedings{KimJon_MicroMIL_MICCAI2025,
author = { Kim, JongWoo and Wong, Bryan and Fu, Huazhu and Quiñones Robles, Willmer Rafell and Ko, Young Sin and Yi, Mun Yong},
title = { { MicroMIL: Graph-Based Multiple Instance Learning for Context-Aware Diagnosis with Microscopic Images } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15960},
month = {September},
page = {389 -- 399}
}
Reviews
Review #1
- Please describe the contribution of the paper
A MIL formulation based on clustering patches and a spatial information free Graph network is presented. It is claimed this is the first MIL method targeted at “microscopy” images (by which I think the authors mean sets of images taken using a conventional microscope and scanner - although this isn’t explicit)
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
While the idea of using a subset of patches isn’t new, the exact formulation is. GNNs aren’t new by the exact way the graph is constructed is. The results look good compared to state of the art methods.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
The paper claims that this is targetted at “microscopy” images - by which I’m assuming they mean images taken from a light microscope using an attached camera? (This is not explicit though). In actuality the images used are from WSI files I think (at least they might as well be) and the fact that Microscopy imaging covers WSI, Confocal, EM etc.just makes this confusing. There is nothing specific in the algorithm that means it couldn’t be used in any MIL scenario. While the method demonstrates superior performance to some SOTA methods it is not clear exactly how much of the dataset is used with these SOTA methods. It has been shown that sub-sampling patches in training can reduce model overfitting for these types of methods. It’s possible the performance increase is simply down to the use of smaller bags and thus less overfitting. The datasets are not challenging (cancer vs no cancer) and are likely to exhibit a) The redundancy the papers says is inherent in all microscopy (not always true) and b) little useful spatial context (if there is tumour in a patch one can predict the case is positive without context).
The idea this is “targetted at microscopy images” because WSIs are not affordable inpoorer countries is also flawed. There are low volume WSI scanners targeted at such markets that are as cheap as a clinical microscope+camera+capture computer (not to mention an analysis computer must be paid for). The real issue in the 3rd world is lack of pathologists in the first place. This is why such countries were some of of the first places WSI scanners installed to allow remote reporting. These were paid for by chairty from richer countries as they allow remote reporting from such richer countries.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Why only join similar patches with graph edges? Surely this excludes using co-occurance of different appearances (e.g. tumour + immune cells?). Clearly co-occurance isn’t likely to be useful in tumour vs no-tumour scenarios, but in other scenarios (predicting survival, or mutation status) it could well be? On P2 You say graph is “to capture structural relationships among instances.” I can’t see how this captures structural relationships given you don’t use spatial information and connect only similar patches.
Context improving things is interesting result given this can only be using co-occurance of similar appearances (not spatial context, or co-occurance of different appearances). Is this really using context? Or is the graph network just acting as a feature de-noising tool?
On P2 You mention WSI datasets (TCGA-NSCLC and Camelyon16), but these aren’t used later. This sentence is confusing. You mention work, but not what it is. This is never mentioned again?
P4: How do you determine the number of clusters? Surely this is important? What is the sensitivity of the method to this choice?
P5:represegnted [spelling error]
Which feature extractor are you using? This is a big omission not saying. A CNN -based one of SSL/ViT one? Wold it make adifference if you used a different feature extractor?
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(3) Weak Reject — could be rejected, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The novely of this paper is not high and there are some issues. However, results beat SOTA. I do however have reservations why this is the case (see earlier sections). Additionally, the graph joining just similar cluster examples (appearance wise) with edges, is not well motivated.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
The rebuttal confirmed baseline methods are using all patches in a bag, whereas the presented method method uses a (clustered) subset. It has previously been shown simply sub-sampling can help to avoid overfitting, so this may be the reason for enhanced performance. This should at least be discussed.
Rebuttal: “we assume visual similarity reflects spatial closeness”. Had this been stated in the paper I would have raised it as the nonsense that it is. Similar appearances (tissue types) can appear in different parts of the tissue. This does not necessarily detract from the method, but it should not be added in revision!
Review #2
- Please describe the contribution of the paper
The paper discusses the importance of microscopy imaging compared to WSI and proposes the first weakly supervised MIL method for microscopy images. The proposed method tackle the redundant images and absence of spatial coordinates problems of microscopy images with deep cluster embeddings and usage of cosine similarity to define edges, respectively.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Clear introduction on the importance of microscopy imaging.
- The idea of clustering and using a representative with hard Gumble to reduce redundancies is interesting.
- Solid evaluation based on accuracy, AUC, and f1-score and a detailed study on the feature importance of the proposed model.
- Clear description of the data and experiment.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Different numbers of clusters have been used for the datasets—36 and 16. What is the procedure for determining these numbers for a new dataset? It appears that this part of the model is not automated, and there is no consistent number of clusters that works across different datasets. An ablation study on the number of clusters would have been valuable.
- Previous work on microscopy images has not been discussed in detail. It would be helpful to see how the contributions compare to prior work, particularly in how redundancy and the absence of coordinates were handled.
- It appears that different magnifications were all processed using the same feature extractor, rather than employing separate extractors or a multi-scale MIL approach. This could impact the quality of extracted features, as the size of cells and nuclei plays an important role in determining malignancy. Without explicitly accounting for resolution, it can be unclear whether observed differences are due to magnification changes or tissue abnormalities.
- It would have been preferable to use a feature extractor trained on microscopy data or to train one specifically for this task.
- Future work and limitations of the model have not been discussed.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
- Please explain the reverse similarity in section 3.5.
- In table 2, what do you mean by top and bottom 10% of redundant images? How have you ordered them to have top and bottom?
- In section 3.1 BreakHis dataset, I believe the “average of 96.4 images per patient” should be 97.6 (7909 / 81 = 97.6).
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The problem is interesting and ideas are well suited for the challenges of the microscopy imaging. However, it seems the model has dependencies on the number of clusters. In addition, extracting features based on pretrained ResNet18 for different magnification is not ideal for comparison to state-of-the-art as explained in the weaknesses.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
I recommend rejecting the paper for the following reasons:
- Application of the study: As Reviewer 1 pointed out, the motivation that WSIs are not accessible is unconvincing. The rebuttal and cited papers do not adequately support this claim or address the concern.
- Task complexity: The paper focuses on a cancer vs. non-cancer classification task, which is considerably simpler than more clinically relevant problems such as cancer grading. As a result, the model’s performance can not be meaningfully assessed, limiting the strength of the conclusions.
- Handling of multiple magnifications: The use of a single feature extractor across different magnifications is not a robust strategy, especially for complex tasks where high-resolution features are essential. Moreover, using separate feature extractors can lead to differing embeddings for the same spatial region. The current approach does not provide a sufficient solution or experimental validation in this regard.
- Maturity of the work: The paper would benefit from further development to strengthen its contribution. For instance, the clustering process should be automated, the handling of multiple magnifications should be improved, and the overall clarity and presentation of the study need refinement.
Review #3
- Please describe the contribution of the paper
This paper proposed a GNN-based weakly supervised learning approach tailored for microscopy images which are more redundant than WSIs and lack position information. Several techniques are adopted for constructing the GNN representation such as deep clustering, similarity computation, etc. The proposed method achieves the highest performance on two datasets.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
It’s a novel formulation of MIL. The main contribution is the method proposed to reduce redundancy.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
Overall, this paper presents some level of innovation, particularly in considering the redundancy characteristics of microscopic images. However, several key aspects of the methodology remain ambiguous, which affects the clarity and reproducibility of the work.
- Unclear Determination of Cluster Number C in the Clustering Process (1) The paper does not explicitly explain how the number of clusters C is determined during the clustering process. (2) Without a clear justification, it is difficult to assess the robustness and generalizability of the proposed clustering approach. It’s better to experiment on various C for ablation.
- Ambiguity in Notation and Mathematical Formulation (1) Equation (4) contains potential notation confusion: it is unclear whether R_c and q_c refer to the same entity or represent different aspects of the model. (2) The relationship between these terms should be explicitly clarified to avoid misinterpretation.
- Unclear Experimental Motivation and Presentation in Table 2 (1) The purpose of the experiments presented in Table 2 is not well articulated. In particular, it is not explicitly explained how these results support the claim of “Robustness on Image Redundancy Shift.” This causes the conceptual gap between the experiment setup and the intended conclusion, making it difficult to interpret the significance of the results.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
I justify my recommendation. The clarity of the method and experimental settings are my main concerns.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
N/A
- [Post rebuttal] Please justify your final decision from above.
N/A
Author Feedback
We appreciate the reviewers’ insightful feedback and address the main concerns below (R: Reviewer, W: Weakness, O: Optional).
[Experimental Design] (R1O5,R2W1,5,R3W1) We sampled 16(4^2), 25(5^2),36(6^2) clusters and reported the best. We found minimal performance differences across these choices and MicroMIL still consistently outperformed all baselines. Although online clustering (DCE) requires a predefined cluster number, we plan to address this by exploring automatic cluster number selection in future work.
(R3W3,R2O2) Table 2 shows how baseline MIL methods degrade under extreme redundancy (B10->T10, T10->T10), while MicroMIL remains robust. Even in simulated low-redundancy settings (T10->B10), MicroMIL outperforms the baselines. To design this, we counted images exceeding the redundancy threshold (Fig. 1 mid) per patient, then selected the highest (Top) 10% and lowest (Bottom) 10% of patients.
(R1O7,R2W4) We used ResNet-18 ImageNet for all models to ensure fair comparison (Sec. 3.2). Due to limited space, results with stronger feature extractors are not shown. Still, we have had similar improvements under those settings.
(R1W2) All baseline models and MicroMIL use the same input bags for fair comparison, but unlike baselines that use all instances, MicroMIL uses an RIE module (DCE and hard Gumbel-Softmax) to learn end-to-end representative instance selection, which is our main contribution.
[Methodology] (R2W3,5) We agree with your point; while our current method does not explicitly model scale, it focuses on achieving robust performance through redundancy removal. In future work, we plan to extend our approach by constructing graph edges for each scale and hierarchy.
(R1O1,2,3) Modeling spatial proximity is key for contextual learning, but light microscopy lacks absolute coordinates. Fig. 3 (w/o RIE) shows that redundancy limits diverse interactions, which we mitigate using RIE. As exact spatial information is unavailable, we assume visual similarity reflects spatial closeness and construct a similarity graph. This approach outperforms “No Connect” (Fig. 5), highlighting the importance of contextual modeling. Still, it remains limited in handling heterogeneous patterns, which we plan to address through extensions to the graph structure.
[Light Microscopy Dataset] (R1W1) We clarify that our study uses only non-WSI images manually captured with conventional light microscopes, not tiled WSIs, and will revise the paper to reflect this.
(R1W3) We appreciate your concern; however, as more challenging light microscopy datasets are no available, we used the most recent public (BreakHis) and a real-world dataset, where (a) redundancy is specific to manually captured images as shown experimentally, and (b) relying on single tumor patch can be unreliable in weakly supervised settings without instance-level labels.
(R1W4) While WSI scanners are becoming more accessible, light microscopes remain much more widely used, and low-cost optical microscopes tailored for low- and middle-income countries continue to be developed [1,2]. Thus, MicroMIL is designed for real-world conditions where microscopy-based diagnostics are still standard; we will clarify this in the final version. [1] McDermott, M. et al. Multi-modal microscopy.. [2] Zhang, H. et al. Towards ultra..
[Related Work] (R2W2) Prior work has primarily employed statistical or ensemble-based methods [3] [4]. The most related approach [5] does not address critical issues specific to light microscopy images (high redundancy and the lack of spatial coordinates). [3] Nguyen, T. et al. Classification of colorectal tissue..[4] Gandomkar, Z. et al. MuDeRN..[5] Kim, J. et al. Leveraging Spatial..
[Notation & Clarification] [R1O4] Figure 1 uses WSI datasets only to compare redundancy with light microscopy data, not for training or evaluation [R3W2] R_c → q_c, R → Q (Eq. 4) [R2O1] 1/similarity
We thank the reviewers for their feedback and look forward to presenting this work.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Reject
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
The paper lacks sufficient novelty, with unconvincing motivation and inadequate handling of key technical challenges such as multi-magnification processing. The experimental setup does not robustly validate the advantages of the method and the performance improvements may stem from dataset-specific factors rather than methodological innovation. Overall, the work is premature and requires significant refinement before being considered for publication.
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
I also agree with R1 it is a marginal paper but vote for acceptance taking the majority voting of the reviewer decision.