Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

The field of computational pathology has recently seen rapid advances driven by the development of modern vision foundation models (FMs), typically trained on vast collections of pathology images. Recent studies demonstrate that increasing the training data set and model size and integrating domain-specific image processing techniques can significantly enhance the model’s performance on downstream tasks. Building on these insights, our work incorporates several recent modifications to the standard DINOv2 framework from the literature to optimize the training of pathology FMs. We also apply a post-training procedure for fine-tuning models on higher-resolution images to further enrich the information encoded in the embeddings. We present three novel pathology FMs trained on up to two orders of magnitude fewer WSIs than those used to train other state-of-the-art FMs while demonstrating a comparable or superior performance on downstream tasks. Even the model trained on TCGA alone (12k WSIs) outperforms most existing FMs and, on average, matches Virchow2, the second-best FM published to date. This suggests that there remains a significant potential for further improving the models and algorithms used to train pathology FMs to take full advantage of the vast data collections.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4651_paper.pdf

SharedIt Link: https://rdcu.be/eHwZi

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04984-1_55

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/kaiko-ai/midnight

Link to the Dataset(s)

N/A

BibTex

@InProceedings{KarMik_Training_MICCAI2025,
        author = { Karasikov, Mikhail AND van Doorn, Joost AND Känzig, Nicolas AND Erdal Cesur, Melis AND Horlings, Hugo Mark AND Berke, Robert AND Tang, Fei AND Otálora, Sebastian},
        title = { { Training state-of-the-art pathology foundation models with orders of magnitude less data } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {573 -- 583}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper presents three pathology foundation models (FMs) trained with significantly less data than other state-of-the-art models (as little as 12k WSIs), yet achieving competitive or superior performance on a range of downstream tasks. It demonstrates that efficient training workflows and architectural choices can rival much larger models/data settings.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) Addresses a crucial scalability question in computational pathology: whether massive datasets are required to train performant foundation models. This is highly relevant in contexts where data is expensive or proprietary. 2) As a methodological innovation, it integrates high-resolution post-training for ViTs with 512×512 tiles and larger crop views. 3) Uses established benchmarks (eva, HEST) for both classification and regression tasks. 4) The models (especially M-92k-392) outperform existing models like Virchow2 and UNI-2 on average performance across tasks.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

1) The high-resolution model (M-92k-392) shows degraded performance on HEST and Camelyon16, without a clear explanation. 2) The PRV-80k dataset is proprietary, and while some stats are provided, its representativeness, quality, and diversity are unclear, which slightly hinders reproducibility and generalizability.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper makes a significant, timely, and well-substantiated contribution to pathology AI. It challenges the assumption that massive datasets are required for performant FMs, offering a leaner and more resource-efficient alternative. While some task-specific regressions and limited dataset details could be improved, the technical depth, novelty, and experimental rigor merit acceptance.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The paper presents three new foundation models trained with significantly fewer WSIs than the current SoTA, but still achieving competitive or superior performance across multiple downstream tasks. The models are trained with a modified Dino-v2 framework, with improvements such as KDE regularization and color augmentations.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The work challenges the existing assumption that massive data is a requirement for training competitive foundation models in pathology. This is a significant insight, especially for research groups without access to a lot of data.
- The training pipeline is thoroughly described, with clear ablations, both on method components as well as dataset splits.
- The proposed method matches or outperforms SoTA encoders such as UNI-2 and Virchow, despite being trained on orders of magnitude less data.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- While the results are strong, it’s not entirely clear which component is primarily responsible for the improved performance. The ablation study (Table 3) evaluates the impact of HED, KDE , and HSV – but only KDE regularization shows a significant effect, with the others contributing marginal gains. This raises the question of whether the strong results are mostly due to one key change, rather than a combination of innovations.
- Minor weakness: The link provided for training config and model weights is broken.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Can the authors provide more details on online patching? It is not clear if a WSI are loaded in memory for every sample in the dataloader.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The experimental evidence is strong, writing is clear, and findings are impactful.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

The main contributions of the paper are as follows:

The authors present three new pathology foundation models (FMs) that are trained using significantly fewer whole slide images (WSIs) compared to existing state-of-the-art models, demonstrating the feasibility of achieving competitive performance with lower data requirements.

The study adopts and modifies recent advancements in the DINOv2 training framework, tailoring it for pathology-related tasks. This adaptation highlights the applicability of emerging techniques from general vision applications to the specific challenges of computational pathology.

The authors implement a post-training procedure aimed at fine-tuning the models on higher-resolution images. This step enhances the quality of the embeddings and further enriches the information learned by the models.

By making their M-12k model, trained exclusively on TCGA, accessible to the research community under the MIT license, the authors promote transparency and enable further exploration and validation of their findings.

The results suggest that there is substantial untapped potential for improving pathology foundation models. The study encourages future research and indicates that high performance can still be achieved with datasets smaller than traditionally thought necessary.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors have conducted thorough experiments demonstrating the effectiveness of their proposed foundation model (FM) training and post-training paradigm, particularly with limited data.

The comprehensive ablation study provides valuable insights into the performance impact of various model components along with the impact of different datasets on developed models, enhancing the manuscript’s credibility.

The paper is well-written and effectively structured, making complex concepts accessible to readers.

By making one of their models publicly available upon acceptance, the authors promote collaboration and encourage further research.

The clear outline for future studies showcases the authors’ vision, enhancing the manuscript’s relevance.

Overall, this work presents notable advancements in training methodologies with significant implications for clinical applications.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- In the high-resolution post-training section (Page 4), the authors do not adequately explain the rationale behind resizing images from 256 px to 512 px, and then to 392 px. It would enhance clarity if they could justify why they are not resizing directly from 256 px to 392 px for this training setup.
- Table 2 should include additional details such as the number of whole slide images (WSIs) or image patches each model is trained on, as well as model sizes (if available). Including this information would improve the readability and strengthen the claims regarding high-level performance with lower data sizes.
- It could be beneficial for the authors to develop an M-12k-392 model, alongside the existing M-92k-392, for a more comprehensive comparison of the post-training paradigm’s impact on model performance. Furthermore, sharing the M-12k-392 model with the scientific community, given that it’s trained on a public dataset using their proposed mechanism, would be an important contribution.
- The manuscript lacks statistical analysis to determine whether the observed increases or declines in model performance are statistically significant. Including such analysis would lend more rigor to their findings.
The ordering of rows in Table 2 should be structured for better readability. Additionally, separating/bundling or color-coding the columns based on different task types would enhance clarity and comprehension of the table’s information.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My decision to recommend acceptance of this manuscript is based on several major factors that highlight its strengths:

The authors propose a novel foundation model training and post-training paradigm that addresses important challenges in handling limited data scenarios in clinical applications. Their approach demonstrates significant improvements in model performance with lower data size, which could lead to impactful advancements in the field.

The experiments conducted are robust and comprehensive, providing strong evidence for the effectiveness of their proposed methods. The inclusion of a well-structured ablation study adds depth to the analysis, allowing readers to understand the contributions of various elements of the model.

The manuscript is well-written and logically organized, which makes it accessible to a broad audience. The clear presentation of results, along with informative figures and tables (although can be improved), enhances the overall readability of the work.

The authors’ commitment to sharing one of their models publicly upon acceptance demonstrates their dedication to advancing research in the field and fostering collaboration. This transparency is commendable and will benefit the scientific community.

The authors provide a thoughtful discussion on the implications of their findings for future research, showcasing their understanding of the topic and the potential for further exploration in this area.

While there are minor weaknesses and areas for improvement, such as the need for certain clarifications and data inclusions mentioned in the review, these do not detract from the overall significance and quality of the manuscript. Given the novel contributions and strong validation presented, I believe this paper warrants acceptance.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank all the reviewers for their positive evaluation of our work. We have made several updates to the manuscript. We recomputed the evaluation with an increased number of runs for higher stability. This led to reduced variance in the estimated means and more consistent results across various metrics, including those for Camelyon16. Additionally, we now include an extended version of the results table, reporting standard deviations calculated over multiple evaluation runs.

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

Training state-of-the-art pathology foundation models with orders of magnitude less data

Author(s):