Abstract

Acquiring annotations for whole slide images (WSIs)-based deep learning tasks, such as creating tissue segmentation masks or detecting mitotic figures, is a laborious process due to the extensive image size and the significant manual work involved in the annotation. This paper focuses on identifying and annotating specific image regions that optimize model training, given a limited annotation budget. While random sampling helps capture data variance by collecting annotation regions throughout the WSIs, insufficient data curation may result in an inadequate representation of minority classes. Recent studies proposed diversity sampling to select a set of regions that maximally represent unique characteristics of the WSIs. This is done by pretraining on unlabeled data through self-supervised learning and then clustering all regions in the latent space. However, establishing the optimal number of clusters can be difficult and not all clusters are task-relevant. This paper presents prototype sampling, a new method for annotation region selection. It discovers regions exhibiting typical characteristics of each task-specific class. The process entails recognizing class prototypes from extensive histopathology image-caption databases and detecting unlabeled image regions that resemble these prototypes. Our results show that prototype sampling is more effective than random and diversity sampling in identifying annotation regions with valuable training information, resulting in improved model performance in semantic segmentation and mitotic figure detection tasks. Code is available at https://github.com/DeepMicroscopy/Prototype-sampling.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2268_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2268_supp.pdf

Link to the Code Repository

https://github.com/DeepMicroscopy/Prototype-sampling

Link to the Dataset(s)

http://gigadb.org/dataset/100439 https://github.com/DeepMicroscopy/MITOS_WSI_CMC/tree/master/databases

BibTex

@InProceedings{Qiu_Leveraging_MICCAI2024,
        author = { Qiu, Jingna and Aubreville, Marc and Wilm, Frauke and Öttl, Mathias and Utz, Jonas and Schlereth, Maja and Breininger, Katharina},
        title = { { Leveraging Image Captions for Selective Whole Slide Image Annotation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a sample selection strategy based on image-caption retrieval to replace the commonly used cluster results

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed idea about using the image-caption revival to generate prototype feature is relatively novel. It does remove the difficulty of parameter setting for clustering.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Although the idea is relatively novel, the authors did not provide enough experiments in the paper to substantiate the idea. Since it claims better performance compared with the clustering methods, the others should at least compare with several STOA active learning methods that uses clustering in Histopathological image segmentation.
    2. How the performance of image-caption retrieval will affect the finally results are not fully discussed. Will it limit to common use cases that can be easier and more accurately retrieved from open resources? More proof should be provided to discuss the limitations and the benefits of using image-caption retrieval.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Can be reproduced if the code is provided upon accept.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The proposed idea is relatively novel, but the limitation and the effectiveness of image-caption retrieval for histopathology images is not fully discussed in the paper. Please add more analysis to support your proposed method.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novelty, experiments and analysis.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The paper has solved all my concerns. Although there are still some application limitation of the proposed method, but the paper proposes a relatively new direction about how to solve such problems.



Review #2

  • Please describe the contribution of the paper

    This paper brings an interesting idea on image-caption integration supported generation of histology image annotations necessary for deep learning tasks. Specifically, a prototype sampling pipeline has been developed. It consists of class prototype creation, similarity map construction, and annotation region selection. The primary contribution lies in the first step, i.e. class prototype creation with prototypes found by two image caption databases by keyword search and text-2-image retrieval.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of this paper is the idea of identification of specific image regions for annotation by leveraging image-caption information in existing image-caption pair datasets. Additionally, the method is validated with two different common pathology image tasks, i.e. tumor semantic segmentation and mitotic figure detection. Overall, this paper tries to address an important and common bottle neck problem in histopathology image analysis tasks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The major component of the proposed prototype sampling relies on existing methods/models, such as the Contrastive Language-image Pretraining (CLIP), ResNet18, ViT_PLIP, PLIP, standard and adaptive selection methods.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This paper tackles an important problem common to a large number of pathology WSI analysis tasks. Therefore, it has a big impact. The idea of drawing prior information on image-caption from existing databases for locating image regions of specific classes for annotation is interesting. This idea is clearly demonstrated in the paper. However, it is not clear why top 100 image embeddings are chosen for the embedding database in section 2.2 “Prototype Sampling”. It is not clear why a fixed number is selected for all classes. One suggestion is to use a fixed cosine similarity cutoff to determine the number of top image embeddings for each class instead. As one key component in the developed method is the image and text encoder, it is worth doing more extensive experiments on the impact of choice of image and text encoders on the result performance. For sampling method comparison in section 3.2 for metastasis semantic segmentation, it is not clear how to set the optimal number of clusters for the diversity sampling. Clarification on this is necessary as an inappropriate choice of the cluster number may harm the diversity sampling performance noticeably.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The primary driving factors are 1) the strong importance of the problem tackled by this paper, 2) the interesting idea on leveraging image-caption prior information for identification of pathology image regions for annotation, and 3) good validation experiments on segmentation and detection. However, this paper is somewhat limited on the method technical novelty as most of its components leverage existing methods.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Authors’ rebuttal responses partially address my questions.



Review #3

  • Please describe the contribution of the paper

    The authors propose a novel approach to annotation region selection using prototype sampling, aiming to reduce annotation efforts by selecting the most effective patches to label. The process involves recognizing class prototypes from extensive histopathology image-caption databases, using captions to distinguish images and extract the prototypes, and detecting unlabeled image regions that resemble these prototypes. The results demonstrate that prototype sampling is more effective than random and diversity sampling in identifying annotation regions with valuable training information, leading to improved model performance in semantic segmentation and mitotic figure detection tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel idea: The authors’ approach to using captions to extract prototypes is a fresh and innovative idea that has the potential to significantly reduce annotation efforts.
    2. Clear writing: The authors’ writing is solid and easy to understand, making the paper accessible to a broad audience.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Limited comparison: The authors only compare their method with basic methods, such as random and diversity sampling. To demonstrate the competitiveness of their approach, they should compare it with more state-of-the-art methods, such as the label-efficient nuclei segmentation framework proposed by Lou et al. (2022).

    Lou W, Li H, Li G, Han X, Wan X. Which pixel to annotate: a label-efficient nuclei segmentation framework. IEEE Transactions on Medical Imaging. 2022 Nov 10;42(4):947-58.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The authors leave blank at the code link, but I suppose they will release the source code upon acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Compare with more state-of-the-art methods: The authors should compare their method with more recent and relevant approaches, such as the label-efficient nuclei segmentation framework proposed by Lou et al. (2022), to demonstrate its competitiveness and advantages.
    2. Investigate the impact of caption quality: The authors could explore the impact of caption quality on the effectiveness of their prototype sampling approach, as well as the robustness of their method to noisy or incomplete captions.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend weak accept for this paper due to its novel and innovative approach to annotation region selection using prototype sampling. The authors’ clear writing makes the paper easy to understand and demonstrates the effectiveness of their method. While the limited comparison with state-of-the-art methods is a concern, the authors’ approach shows promise and warrants further exploration. Overall, the paper’s strengths outweigh its weaknesses, and I believe it will make a valuable contribution to the field of medical image analysis if the authors could include more comparison with state-of-the-art methods.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank all reviewers for their constructive comments and encouragement on the paper’s strength, including novelty (R1, R3, R4), significance of the solved problem (R1, R3, R4), demonstration of method effect (R1, R3), and clear writing (R1, R3). Below we respond to raised questions.

  1. Limited comparison (R3, R4): Methods for selecting an initial annotation set are rather limited, this is why most active learning (AL) algorithms start with random sampling. Diversity sampling (DS) starts typically also from the 2nd AL cycle but has been recently used to select the initial annotation set using powerful pretrained encoders. DS is widely adopted in the latest AL publications, often in combination with other strategies (Wu et al. MICCAI22, Tang et al. MICCAI23, Qu et al. MICCAI23, Chen et al. MIDL23, Föllmer et al. MIDL24, Chen et al. CVPR24). The work of Lou et al. (mentioned by R3) also used DS to select representative annotation patches, while also encouraging the selection of patches with high intra-patch consistency to stabilize subsequent GAN training that creates synthetic training samples. We agree that the extended approach forms relevant baselines for the specific application of nuclei detection, but it is less general and not suited for tasks where consistency is a less relevant predictor for informativeness, e.g., mitosis detection. Therefore, to retain generalizability, we focused on the basic form of DS without application-specific add-ons. The basic form further allows a direct comparison with our proposed method, since the same pretrained models are used for feature extraction to make the differences in the effects of different methods (i.e., assignment of regions to clusters or identified class prototypes) visible.

  2. Impact of caption quality (R3) / prototype retrieval (R1, R4): We expected that the quality of images, captions, and image-caption pairing differs in the ARCH and OpenPath databases. For ARCH, images were sourced from textbooks and publications, and captions were manually reviewed, while OpenPath is larger (176,373 pairs) but less curated with automatic text cleaning and image quality sampling tests only (still, PLIP pretrained on OpenPath has been effective in various evaluations). Therefore, prototypes retrieved from OpenPath may contain (more) noise. This allows us to evaluate the robustness of our approach in different settings. Interestingly, we see no drawback of using OpenPath and/or noise can offset by a stronger pretrained model (see Fig. 2a-b - experiments on CAMELON16; OpenPath was not used on MITOS_WSI_CMC since the image-caption pairs are not directly accessible, which we require to crop the mitotic figure for feature calculation). We prioritized this “real world” comparison over an analysis with artificially added noise patterns. We see value in assessing the quality of the prototype set by including different numbers of prototypes or using a more adaptable way of applying a similarity threshold as suggested by R1, and reserve both evaluations for future work.

  3. Application to rare classes with little open resources (R4): Some cases/classes may indeed be underrepresented in open ressources; still, our proposed approach may help identify sparse but existing characteristic samples and may be used to identify prototypes from related diseases based on prior knowledge. Concerns about finding rare class prototypes are further alleviated by ongoing efforts in creating larger and more diverse histopathological image-caption databases to build powerful all-purpose models. Still, bias and underrepresented diseases are valid concerns, which we will emphasize in the revised paper.

  4. Rely on existing methods (R1): The agnostic design of our approach to models or region selection methods allows it to be seamlessly applied to any new method.

  5. Cluster number K in diversity sampling (R1): We follow most related works [9, 12, 20] and set K to the number of selected regions.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top