Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Multiple instance learning (MIL) has become the de facto standard approach for whole-slide image analysis in computational pathology (CPath). While instance-wise attention tends to miss correlations between instances, self-attention can capture these interactions, but remains agnostic to the particular task. To address this issue, we introduce Top-Down Attention-based Multiple Instance Learning (TDA-MIL), an architecture that first learns a general representation from the data via self-attention in an initial inference step, then identifies task-relevant instances through a feature selection module, and finally refines these representations by injecting the selected instances back into the attention mechanism for a second inference step. By focusing on task-specific signals, TDA-MIL effectively discerns subtle, yet significant, regions within each slide, leading to more precise classification. Extensive experiments on detecting lymph node metastasis in breast cancer, biomarker screening for microsatellite instability in different organs, and challenging molecular status prediction for HER2 in breast cancer show that TDA-MIL consistently surpasses other MIL baselines, underscoring the effectiveness of our proposed task-relevant refocusing and its broad applicability across CPath tasks. Our implementation is released at https://github.com/agentdr1/TDA_MIL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2460_paper.pdf

SharedIt Link: https://rdcu.be/eHwLY

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04927-8_62

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/agentdr1/TDA_MIL

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ReiDan_TopDown_MICCAI2025,
        author = { Reisenbüchler, Daniel AND Deng, Ruining AND Matek, Christian AND Feuerhake, Friedrich AND Merhof, Dorit},
        title = { { Top-Down Attention-based Multiple Instance Learning for Whole Slide Image Analysis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {651 -- 660}
}

Reviews

Review #1

Please describe the contribution of the paper

They propose TDA-MIL, a two-step multiple instance learning framework for pathology image classification. First, it applies self-attention to all patch features to learn a general context. Then, it uses a feature selection module to identify task-relevant patches, which are reintroduced into a second self-attention stage, refining predictions with a top-down attention strategy. This mimics how pathologists interpret slides—from global overview to localized scrutiny.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed TDA-MIL is different from pipelines before. They use extensive experiments to validate the effectiveness.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

While the proposed top-down attention pipeline and feature selection module are cohesively designed, their underlying components—such as self-attention, cosine similarity, task-specific tokens, and channel rescaling—are primarily adapted from existing techniques. Moreover, attention mechanisms have been widely explored in multiple instance learning for whole-slide image analysis, and the proposed method does not demonstrate substantially superior performance over prior state-of-the-art approaches.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The lack of novelty and the experiment performance of proposed pipeline.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The paper proposes a multiple-instance learning algorithm, TDA-MIL, for computation pathology. The proposed solution, termed Top-Down Attention-based Multiple Instance Learning, aims to cope with self-attention limitations by leveraging a two-step approach. The first step leverages self-attention to identify task-relevant features, which are then injected back into the attention process for the second step. The paper evaluates the proposal against different MIL approaches leveraging three datasets, i.e., CAMELYON17, TCGA-CRC, and TCGA-BRCA.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is well-motivated and targets an important field of medical imaging, i.e, WSI analysis;
- Experiments are performed on heterogeneous datasets, and the reported results demonstrate improvements with respect to selected competitors.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The technical explanation of the proposed approach is sometimes hard to follow, and the missing source code is mining the reproducibility and clarity of the proposal. From Fig.1, it is unclear what parts of the model are frozen and what others are learnable. The role of CLS is unclear; what is its role in the entire pipeline? How is it used in the later stage of the model? Does it correlate to tau in Equation (3), Figure (2)? -The paper proposes a method that can be employed with different feature extraction models. However, they did not provide any ablation study on this but only provided results obtained with UNI. Demonstrating the robustness of the proposal regardless of the feature extractor employed would have strengthened the scientific value;
- Some competitors are outdated. For example, S4MIL is a mamba-based approach that has already been outperformed by other solutions leveraging the same base architecture [1]. More recent graph-based/multi-scale solutions are available in the literature [2, 3];
- The comments about Table 1 (Section 4.1) are just a textual description of the table that provides no insight into the performance gaps.
- Some acronyms are used without explanation, e.g. IHC at the end of the introduction;
- There are many formatting typos and inconsistent sentences. To provide some examples: (i) Section 4.1., “Fig. 3a)” there is a closing bracket without any corresponding open brackets; (ii) at the end of page 3 there two uncomplete sentences: “were queries Q, keys K, and values V”, and “with learnable parameters W0, Wk, and Wv.”.
[1] Yang, S., Wang, Y., & Chen, H. (2024, October). Mambamil: Enhancing long sequence modeling with sequence reordering in computational pathology. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 296-306). Cham: Springer Nature Switzerland. [2] Bontempo, G., Porrello, A., Bolelli, F., Calderara, S., & Ficarra, E. (2023, October). DAS-MIL: distilling across scales for MIL classification of histological WSIs. In International conference on medical image computing and computer-assisted intervention (pp. 248-258). Cham: Springer Nature Switzerland. [3] Li, J., Chen, Y., Chu, H., Sun, Q., Guan, T., Han, A., & He, Y. (2024). Dynamic graph representation with knowledge-aware attention for histopathology whole slide image analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11323-11332).
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the paper addresses an important problem in computational pathology and shows promising results, the clarity of the proposed method is hindered by vague technical descriptions, missing implementation details, and a lack of thorough ablation studies. Additionally, the experimental comparison omits stronger, more recent baselines, which limits the paper’s scientific rigor and impact.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have addressed most of the reviewers’ concerns with clear, detailed clarifications and proposed substantial improvements to the manuscript. They clarified the training scheme, explained the role of the CLS token, and committed to revising figures for better interpretability.

Although only one feature extractor (UNI) was used, the authors justified this choice and stated that preliminary experiments performed with other feature extractors before submitting the paper confirmed the results.

Overall, the paper offers an effective and well-justified approach to MIL in histopathology. I hope the author will follow all their promises if the paper is accepted: text/image changes, source code release, etc.

For all these reasons, I am leaning to an acceptance of the paper.

Review #3

Please describe the contribution of the paper

This paper introduce Top-Down Attention-based Multiple Instance Learning (TDA-MIL), an architecture that first learns a general representation from the data. Extensive experiments shows that TDA-MIL consistently surpasses other MIL baselines, underscoring the effectiveness.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The approach adopted by TDA-MIL, which first observes the global context before refining focus to key regions, may better capture relationships between instances. If only a single attention mechanism were used, the model might overly focus on salient but non-specific features. Therefore, the method proposed in the paper—applying attention initially, then filtering out low-relevance instances through a feature selection module, and finally reapplying attention—could effectively mitigate such biases.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

The FM module is used to extract patch features, but the paper does not mention the architecture of FM. Perhaps some details about FM could be supplemented.

Does the paper employ joint training across multiple datasets? If so, how does the task token T avoid interference between tasks during joint training?

In the feature selection module, patches are filtered via cosine similarity. However, the paper does not analyze the “threshold” settings. Different threshold strategies may lead to variations in model performance.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The extensive comparion with SOTA and ablation study shows the effectiveness of the proposed method. TDA-MIL learns a robust general representation and then refocuses the model on task-relevant patches. This obviously create flexiblity and generalizability in the field.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #4

Please describe the contribution of the paper

This paper presents a top-down attention-based approach that addresses a limitation of self-attention when used in MIL strategies for WSI analysis. The comparative experimental results demonstrated the performance across multiple datasets for multiple clinical tasks.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) The paper is well-organized and well-written. (2) The authors proposed a well-designed framework and nice pictures as well as visualizations are also provided. (3) The comparative experimental results across multiple datasets and tasks demonstrated the effectiveness of the proposed method.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

(1) It will be better if the authors could describe the core motivation and reason for introducing the top-down attention more clearly. (2) The authors should re-organized and re-summarized the contributions of the proposed model rather than the Broad applicability and Interpretability.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The idea and methodology is good, while some contents in this paper should be polished.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Author’s have addressed most of my concerns. I am happy with their responses.

Author Feedback

We thank all reviewers for their constructive feedback and interesting ideas. Code will be released upon acceptance, text errors and minutiae will be updated.

R1) Details FM: We will add that UNI is based on ViT-L with output feature-dim 1024. R1) Task-token and multi-task learning: We did not explore multi-task learning. However, this is a very interesting direction in the context of TDA-MIL for a future study, thanks for bringing this up! R1) Threshold analysis of patch filtering: We use a weighting (Eq. 4) instead of a hard-threshold, so no different threshold values can be compared.

R3) On novelty: Contribution ≠ only plain modules, it is the two-step framework -> “coarse pre-scan -> focus + fine-grained re-attend”: (1) Global self-attention creates context-aware tile embeddings. (2) A cosine task-conditioned selector feeds the most informative subset by weighting into re-attention. No prior MIL work re-attends a task-specific filtered subset. IHC-verified heat-maps confirm that feature selection with top-down attention pinpoints biomarker-specific ducts overlooked by self-attention (Fig. 4), also proved by numerical evidence (Ablation).
R3) On performance: TDA-MIL wins every dataset/metric pair across 5 datasets, 3 tasks, 2 metrics, 9 competitors. Note that experiments include easy-to-hard tasks with varying sample sizes and thus include a variety of scenarios. Performance gains range from +3.16% Bal-Acc (MSI CRC) and +2.5% (MSI UCEC) to +1.41% AUROC (CAMELYON17). Crucially, TDA-MIL always surpasses competitors across different scenarios. Moreover, performance differences are numerically smaller (CAMELYON17) when overall performance is high or different thresholds/measuring techniques are used for acquiring annotations across datasets and binarizing numerical scores (HER-2 Status prediction in TCGA and BCNB). Yet, even small performance differences are of high clinical value as those are gained from correctly classifying difficult cases. Heat-maps further enhance performance in terms of interpretability and verify that feature-selection + TDA recovers diagnostically relevant regions missed by pure attention, as used in earlier MIL works.

R4) Motivation: We will emphasize the two-step framework -> “coarse pre-scan -> focus + fine-grained re-attend” more clearly and re-organize the contributions to highlight the framework

R5) Technical clarification on trainable modules and CLS token: UNI FM backbone is frozen, only TDA-MIL part is trainable (Fig 1B), we will add fire/snowflake icons to Fig. 1. The CLS token performs slide-level pooling (like in ViTs) after the second inference step, and Tau in Eq. 4 is related to the CLS as other tokens/patch features. R5) Other feature extracts than UNI: UNI was SOTA for histology at the time working on the paper and is pre-trained on a private corpus that does not overlap our benchmarks, preventing data leakage. Pre-deadline pilot runs revealed that GigaPath (data leakage) and CONCH (worse to UNI) preserved the same ranking - TDA-MIL remained best - so given the trade-off due to no-supplement rule and page limit we prioritized covering more datasets/tasks over preprocessing (=FM extractor) permutations. R5) Competitors: We compare to 9 diverse MIL methods (attention, transformer, state-space, graph), including CVPR’23/’24 work (MHIM-MIL, RRT-MIL). We understand the concerns in the first place, however some “older” models still beat newer ones when supplied with strong FM derived features (Tab. 1 and also observed in Xu et al. [1]), which were not available earlier. Thus, a trade-off by exchanging them for exclusively “newers” would hide relevant insights. Next, multi-scale was out of scope for our study, however, is an interesting direction for future study as it is orthogonal to our work, we appreciate this recommendation.

[1] Xu et al, “When multiple instance learning meets foundation models: Advancing histological whole slide image analysis”, MIA 2025

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Top-Down Attention-based Multiple Instance Learning for Whole Slide Image Analysis

Author(s):