Abstract

Earlier diagnosis of Leukemia can save thousands of lives annually.
The prognosis of leukemia is challenging without the morphological information of White Blood Cells (WBC) and relies on the accessibility of expensive microscopes and the availability of hematologists to analyze Peripheral Blood Samples (PBS). Deep Learning based methods can be employed to assist hematologists. However, these algorithms require a large amount of labeled data, which is not readily available. To overcome this limitation, we have acquired a realistic, generalized, and {large} dataset. To collect this comprehensive dataset for real-world applications, two microscopes from two different cost spectrum’s (high-cost: HCM and low-cost: LCM) are used for dataset capturing at three magnifications (100x, 40x,10x) through different sensors (high-end camera for HCM, middle-level camera for LCM and mobile-phone’s camera for both). The high-sensor camera is 47 times more expensive than the middle-level camera and HCM is 17 times more expensive than LCM. In this collection, using HCM at high resolution (100x), experienced hematologists annotated 10.3k WBC of 14 types including artifacts, having 55k morphological labels (Cell Size, Nuclear Chromatin, Nuclear Shape, etc) from 2.4k images of several PBS leukemia patients. Later on, these annotations are transferred to other two magnifications of HCM, and three magnifications of LCM, and on each camera captured images. Along with this proposed LeukemiaAttri dataset, we provide baselines over multiple object detectors and Unsupervised Domain Adaptation (UDA) strategies, along with morphological information-based attribute prediction. The dataset is available at: https://tinyurl.com/586vaw3j

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/4180_paper.pdf

SharedIt Link: https://rdcu.be/dV1WR

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72384-1_52

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/4180_supp.pdf

Link to the Code Repository

https://github.com/intelligentMachines-ITU/Blood-Cancer-Dataset

Link to the Dataset(s)

https://github.com/intelligentMachines-ITU/Blood-Cancer-Dataset

BibTex

@InProceedings{Reh_ALargescale_MICCAI2024,
        author = { Rehman, Abdul and Meraj, Talha and Minhas, Aiman Mahmood and Imran, Ayisha and Ali, Mohsen and Sultani, Waqas},
        title = { { A Large-scale Multi Domain Leukemia Dataset for the White Blood Cells Detection with Morphological Attributes for Explainability } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {553 -- 563}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    A new dataset for leukemia cancer was proposed. Images were captured at different resolutions using low-cost, medium-cost, and high-cost setups. Some baselines were also proposed.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The problem definition is described well. The dataset would be useful.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The result section is not very impressive. More metrics should be incorporated. Standard deviations of results should be added. Train-test split should be discussed. More rigorous testing is required. k-fold CV can be used to produce results. The selection of parameters should be discussed.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors claimed to release dataset after acceptance. A executable code file (notebook) could be added for running code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    More metrics should be incorporated. Standard deviations of results should be added. Train-test split should be discussed. More rigorous testing is required. k-fold CV can be used to produce results. The selection of parameters should be discussed.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has some merits. After improving result section, it can be accepted.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The author presented a novel large-scale dataset of WBC for Leukemia obtained using two different microscopes at different magnifications and different types of cameras. The authors also introduced AttriDet, which is an extension of YOLOv5 to include an attribute head to predict the morphological attributes of WBCs.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Diversity: Compared to existing datasets for Leukemia, the proposed dataset has a much larger number of WBC types (14) and a larger number WBCs (88,294) with images from two different microscopes at various magnifications and different types of cameras.

    • A solid set of experiments: The authors compared the performance of AttriDet with other baselines from Sparse R-CNN, FCOS, and DINO for object detection (without domain adaptation). Additionally, they investigate the use of recent methods for domain adaptation (DACA, ConfMix) to improve the performance on.
    • Clear writing and easy to follow.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    No source code for AttriDet is available for reproducibility.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It would be useful if the authors make their code for AttriDet available as well.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    For further improvement: I recommend

    • Have a dedicated section describing how the authors named the subsets of the dataset. Currently, this section is in table 1 of the Supplementary material. This makes it very hard to follow. The author can mention sth like dataset name = [Microscope_type][Magnification] [Camera_type] with Microscope_type = H if the microscope is Olympus or L of the microscope is XSZ_107BM etc.

    • Add more details to the Supplementary Information (SI) on how the cutoffs for different subgroups in Fig. 2b are generated.

    • Add more details (to the SI) on some terms that are not familiar to the reader such as “artifact”. Show some example artifacts in the dataset.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is very well-written, easy to follow. There are small weakness points that can be addressed. It adds a dataset with a baseline of WBC for Leukemia that is useful for the community. I support accepting this paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors introduce a powerful leukemia oriented dataset, and implement an explainable model for analysis of cells.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors contribute a significant dataset to the public, and introduce a model that leverages some of the dataset’s greatest strengths. They provide their code, enhancing reproducibility.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I suggest making comments about the clinical feasibility of this method. Overall, this is a very good work.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Add commentary about the clinical utility of the method.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Strong Accept — must be accepted due to excellence (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work is a big contribution. The dataset is large, and the authors plan to release the dataset. The model is impressive in performance.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We would like to thank all reviewers for their insightful feedback. R1 acknowledges the significance of our research via the challenges and important suggestions addressed in our paper, particularly emphasizing its relevance to the field. R2’s recognition of the reproducible results further validates the robustness of our findings, enhancing the credibility of our study (R3). Additionally, the acknowledgment of the valuable system design and the clarity in exposition by R2 highlights the effectiveness of our approach in communicating complex concepts. Furthermore, R3’s positive remarks on the sufficiency of qualitative and quantitative comparisons, along with the significant improvement over state-of-the-art methods, provide valuable validation of our contributions (R1, R2, R3). We are committed to thoroughly addressing each reviewer’s comments: K-fold cross-validation (R1): Below we share the 3-splits result. The only condition we followed was that all the categories are present in train and test sets, for all the splits. For splits, 70% of data is considered for training, and 30% is used for testing. Three splits are:

Split mAP@50 mAP@50-95 NC NS N C CB CV
Split1 52.2 28.2 73.9 95.9 54.3 89.7 83.6 29.1
Split2 44.2 24.2 76.2 98.1 56.6 91.0 82.7 10.2
Split3 40.5 22.2 75.7 97.1 49.5 91.2 85.5 11.2
Average 45.6 24.8 75.2 97.0 53.4 90.6 83.9 16.8
Std Dev 4.88 2.49 1.14 0.89 2.89 0.74 1.16 8.68

Regarding selection of parameters, we have decided to keep the default settings for simplicity and to maintain consistency.

The naming of a subset of the collected dataset (R3): R3 recommended adding a description on the naming of subsets of the collected dataset. Name assigned to subsets as: H_100x_C1 = [Microscope Cost][Resolution][Camera Type], H stands for high-cost microscope, namely OlympusCX23, 100x is the resolution, and C1 is the camera type. Similarly for other microscope types, camera types and on different resolutions. Supplementary Information (R3): Generation of Cutoffs for Subgroups in Fig. 2b (R3): In Fig. 2b of supplementary materials file, the cell sizes are mentioned as small, medium and large. As the dataset is annotated by hematologists, they have assigned the cell size attribute to the WBC cells. Explanation of Terms and Artifacts (R3): In Fig1, part b, we have shown the None type of WBC with box level annotation. It includes the artifacts (Staining Artifacts, Smear Artifacts, Debris, Air Drying Artifacts etc.) and clamp cells. Clinical Feasibility (R4): Diagnosing leukemia is expensive due to the necessity of using microscopes and relying on expert hematologists, who may not be readily available in all locations. Additionally, advanced and costly tests such as flow cytometry and genetic analysis are often required. Our method can help reduce these costs by providing detailed morphological analysis through AI, potentially decreasing the reliance on these expensive tests.




Meta-Review

Meta-review not available, early accepted paper.



back to top