Abstract

In hematology, computational models offer significant potential to improve diagnostic accuracy, streamline workflows, and reduce the tedious work of analyzing single cells in peripheral blood or bone marrow smears. However, clinical adoption of computational models has been hampered by the lack of generalization due to large batch effects, small dataset sizes, and poor performance in transfer learning from natural images. To address these challenges, we introduce DinoBloom, the first foundation model for single cell images in hematology, utilizing a tailored DINOv2 pipeline. Our model is built upon an extensive collection of 13 diverse, publicly available datasets of peripheral blood and bone marrow smears, the most substantial open-source cohort in hematology so far, comprising over 380,000 white blood cell images. To assess its generalization capability, we evaluate it on an external dataset with a challenging domain shift. We show that our model outperforms existing medical and non-medical vision models in (i) linear probing and k-nearest neighbor evaluations on blood and bone marrow smears and (ii) weakly supervised multiple instance learning for acute myeloid leukemia subtyping by a large margin. A family of four DinoBloom models (small, base, large, and giant) can be adapted for a wide range of downstream applications, be a strong baseline for classification problems, and facilitate the assessment of batch effects in new datasets. All models are available at github.com/marrlab/DinoBloom.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3584_paper.pdf

SharedIt Link: https://rdcu.be/dY6gi

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72390-2_49

Supplementary Material: N/A

Link to the Code Repository

github.com/marrlab/DinoBloom

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Koc_DinoBloom_MICCAI2024,
        author = { Koch, Valentin and Wagner, Sophia J. and Kazeminia, Salome and Sancar, Ece and Hehr, Matthias and Schnabel, Julia A. and Peng, Tingying and Marr, Carsten},
        title = { { DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {520 -- 530}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces DinoBloom, which the paper claimed to be the first large-scale self-supervised foundation model family specifically designed for single-cell hematology image analysis.

    The DinoBlomm was trained on the largest multi-cohort dataset of over 380,000 white blood cell images from 13 datasets.

    The models demonstrate strong generalization capabilities to external datasets despite batch effects, outperforming existing medical and non-medical vision models in cell-type classification and acute myeloid leukemia subtyping tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Assembling the training dataset from 13 existing public datasets is useful for the research community.
    • Extensive evaluation demonstrating strong performance and generalization of the models to out-of-domain test sets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The novelty of the paper is limited: (1) The method itself is not fundamentally novel, as the authors rely on the existing DINOv2 self-supervised learning framework without significant modifications. While they remove the global-local crop loss, this is a relatively minor tweak. (2) Assembling the training dataset is useful for the research community, but it does not constitute a new dataset contribution in itself since they rely on previously published data. (3) The findings largely confirm the results of prior work showing self-supervised models can learn effective representations and outperform supervised baselines, just applied to the specific domain of hematology cell images here. There aren’t major new scientific insights or findings.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    While the paper compared to several relevant baselines, including additional comparisons to other leading self-supervised learning frameworks beyond DINO (e.g. BYOL, Barlow Twins, MoCo, etc.) could further strengthen the empirical evaluation and help isolate the benefit of domain-specific training from the base self-supervised approach.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Reject — must be rejected due to major flaws (1)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the assembled datasets and models might be helpful to the community, there isn’t much novelty here.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors adapt the DINOv2 model to train DinoBloom, a foundation model for cell analysis in hematology. They employ a diverse hematology dataset and present a thorough evaluation of the proposed models. The presented analysis confirms that DinoBloom is superior to existing models for different downstream tasks, including cell classification and AML classification through multiple instance learning.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The authors commit to share the trained models, which will have a great value for the image analysis community working with this image modality.
    2. The evaluation framework and description of the methodology is convincing and the results convincing, which should facilitate the adoption of their models upon release.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. There is no clear methodological contribution. DinoBloom just differs from DINOv2 in the training data and that the global-local crop loss is removed.
    2. The results are not surprising and mostly confirm the numerous findings of the last years showing that self-supervised models are better (i) when trained on domain-specific data (cells in this case) than generic ones trained on camera images, and (ii) when trained on the same domain where they are evaluated. (i) can be seen in Table 2 - Acevedo, where the results for domain specific models (DinoBloom, CTransPath, and Phikon) outperform the others, but the differences between DinoBloom and the other domain-specific models are smaller. (ii) can be seen in Table 2 - AML Hehr and Table 3, where DinoBloom, which is trained on this dataset, outperforms the rest by a larger margin.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    As described above, the methodological contribution of this paper is very limited. However, the impact of releasing these models should be valued as a significant contribution, especially considering that the data, models, and evaluation employed are clearly described.

    While the contributions stated above are clear, the authors use some strong statements, which in my opinion are sometimes unfounded. In particular:

    • “DinoBloom models outperform existing models on single WBC classification on the external dataset Acevedo by a large margin in all variants.” -> The results don’t show this. Some of the results only change by ~1%. If models of different sizes are compared, there are scenarios with no improvement at all (e.g. DINOv2 ViT-G has the same Linear probe wF1 as DinoBloom-S).
    • “We show that our model learns robust and meaningful features across domains” -> This is a very strong statement given the limited qualitative assessment in Fig. 3.

    Regarding Fig. 2, the results for the embeddings look very promising. However, the impact would be a lot bigger if it could be shown that this analysis is clearly better with DinoBloom than with the other models, similar to the analysis in Fig. 3.

    Some minor comments:

    • Please elaborate on why Acevedo exhibits a strong batch effect.
    • There are no model replicates in the results. Training different downstream models from scratch is not so computationally expensive and would strengthen the conclusions, since it is known that deep learning models are not very stable in most cases. While it is understandable that this may be out of the scope of a conference paper, mentioning this in the paper would be appreciated.
    • Typo in the conclusion: “grerat” -> “great”
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I have a positive opinion of this paper given the high impact that the presented models can have in the community and the convincing presentation of the results. While the lack of a clear methodological contribution is in my opinion not a big issue, it is not a stronger accept given the flaws mentioned above.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors make a strong case in the rebuttal to support the impact of their contribution. While the proposed method is by itself not really novel, this paper has numerous other contributions that should be highly valued.



Review #3

  • Please describe the contribution of the paper

    This paper contributes to the study of single-cell hematology by adapting the large-scale, self-supervised DINO-V2 model pipeline to this domain, as well as by assembling a large-scale hematology dataset. The quantitative and qualitative results show that their methods are effective and can contribute to the research community.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) It is the very first work to consider a foundational model in the hematology domain. In fact, the adaptation of foundational models to different subdomains of medical imaging is indeed a critical topic to study. With that, the authors conduct a comprehensive study and achieve promising results. (2) The self-supervised learning framework (DinoBloom) is well-curated for the hematology task, as single-cell image analysis requires nuance in representation learning, thereby leading to improvement over DINO-V2. (3) The experiments and evaluations, including qualitative and quantitative analyses, are comprehensive enough to be accepted as a research paper. (4) The paper presentation is clear and concise, focusing on the extensive data collection and adopted model pipeline. It is very easy to follow and to the point.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) There is a decrease in WBC classification from DinoBloom-S to DinoBloom-G. The authors should try to explain this observation since it could also be valuable to the community. (2) Fig.2 may be biased since the images used in the visualization of low-dimensional representation are all from the training set. Though generally acceptable, it would be good to consider data from outside the train set, or from a totally different domain (but still in hematology). (3) There are some concurrent works also focusing on leveraging unsupervised learning in the DINO pipeline [1-2]. Though not directly for hematology study, they should be worth exploring in the subsequent work of DinoBloom.

    Minor: (1) The author should include the random seed in the 5-fold cross-validation, at least in the appendix, to enhance reproducibility. (2) Image quality (i.e., Fig2 and Fig3) could be improved.

    [1] Sinhamahapatra P, Schwaiger F, Bose S, et al. Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes[J]. arXiv preprint arXiv:2404.07664, 2024. [2] Wang X, Girdhar R, Yu S X, et al. Cut and learn for unsupervised object detection and instance segmentation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 3124-3134.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The collected datasets are from different sources, and the authors promise to open-source their dataset in another paper, so I assume this work will eventually be open-sourced.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) I would suggest adding the source and a short description of different datasets in the appendix to enhance the visibility of the referred data. (2) Consider adding an explanation for the drop from DinoBloom-S to DinoBloom-G in WBC classification.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Comprehensiveness of the work, the first work that focuses on a foundational model in hematology, and the methodology, evaluation, and presentation of the paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We would like to thank all reviewers for their constructive comments and thorough evaluation of our work. We are pleased to note the recognition of our paper as the first to consider a foundational model in hematology, leveraging DINOv2 for single-cell image analysis (R4). As R1 states, “the impact of releasing these models should be valued as a significant contribution, especially considering that the data, models, and evaluation employed are clearly described.” Moreover, we appreciate the positive feedback on our extensive evaluation and the strong performance of our models, demonstrating robust generalization capabilities to external datasets despite batch effects (R3, R4). The effort to assemble the large-scale dataset from 13 diverse public sources has also been highlighted as a significant contribution (R3, R4). The clarity of our manuscript is rated as “very easy to follow and to the point” (R4). All reviewers commended our commitment to sharing the trained models and code. We acknowledge the questions raised by R3 regarding the methodological novelty of our paper. While our core model leverages the existing DINOv2 framework, our contributions lie in the adaptation and optimization of this framework specifically for hematology, making this the “first work to consider a foundational model in the hematology domain” (R4). Extensive experiments showed that removing the global-local crop loss significantly improved the model, which has not been previously described in the literature. Our work’s contribution lies however not only in these technical adjustments but in particular in the successful application and validation of these models in a highly specialized domain. To put it into the words of R4: “In fact, the adaptation of foundational models to different subdomains of medical imaging is indeed a critical topic to study.“ Especially in hematology, where multiple instance learning is used to address disease classification problems, a robust feature extractor is needed. Therefore, we respectfully disagree with the “Strong reject – major flaws” rating (R3), as we cannot see any major flaws raised by R3. The given reason for this score “Although the assembled datasets and models might be helpful to the community, there isn’t much novelty here” contradicts the MICCAI call for the submission of Foundation Models, which do not necessarily involve significant methodological novelties. Given that all reviewers acknowledge the value of our work to the community, we believe this should be the determining factor in advancing scientific progress. In other instances, models using DINOv2 or similar frameworks without any technical improvements have been published in respected journals [1,2,3]. We agree with R1 that some statements might be perceived as too strong and revised them accordingly to ensure clarity and accuracy. To clarify the content of Figure 2: it includes visualizations from the training set (initial UMAP) and unseen test sets (embedded in the fixed UMAP fitted on the training set). We acknowledge that a completely out-of-domain test would be favorable, however, we were not able to find an additional dataset with patient-level annotations. We added a reference indicating that the held-out Acevedo dataset exhibits a strong batch effect [4]. We believe that our work makes significant contributions to the field of hematology image analysis, both through the development of DinoBloom and the assembly of a large-scale, diverse dataset. We look forward to “the high impact that the presented models can have in the community” (R1) and appreciate the reviewer’s feedback, which has helped us improve our paper. [1] https://doi.org/10.1038/s41551-023-01049-7, Nat. Biomed. Eng. [2] https://doi.org/10.1038/s41591-024-02856-4, Nat. Medicine [3] https://doi.org/10.1038/s41591-024-02857-3, Nat. Medicine [4] https://doi.org/10.1007/978-3-031-45857-6_14, MICCAI 2022




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I accept the rebuttal from the authors that While the core model leverages the existing DINOv2 framework, our contributions lie in the adaptation and optimization of this framework specifically for hematology, making this the “first work to consider a foundational model in the hematology domain.

    I like how this work provide a different research angle in the era of Foundation model.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I accept the rebuttal from the authors that While the core model leverages the existing DINOv2 framework, our contributions lie in the adaptation and optimization of this framework specifically for hematology, making this the “first work to consider a foundational model in the hematology domain.

    I like how this work provide a different research angle in the era of Foundation model.



back to top