Abstract

Pathology image classification plays a crucial role in accurate medical diagnosis and treatment planning. Training high-performance models for this task typically requires large-scale annotated datasets, which are both expensive and time-consuming to acquire. Active Learning (AL) offers a solution by iteratively selecting the most informative samples for annotation, thereby reducing the labeling effort. However, most AL methods are designed under the assumption of a closed-set scenario, where all the unannotated images belong to target classes. In real-world clinical environments, however, the unlabeled pool often contains a substantial amount of Out-Of-Distribution (OOD) data, leading to low efficiency of annotation in traditional AL methods. Furthermore, most existing AL methods start with random selection in the first query round, leading to a significant waste of labeling costs in open-set scenarios. To address these challenges, we propose OpenPath, a novel open-set active learning approach for pathological image classification leveraging a pre-trained Vision-Language Model (VLM). In the first query, we propose task-specific prompts that combine target and relevant non-target class prompts to effectively select In-Distribution (ID) and informative samples from the unlabeled pool. In subsequent queries, Diverse Informative ID Sampling (DIS) that includes Prototype-based ID candidate Selection (PIS) and Entropy-Guided Stochastic Sampling (EGSS) is proposed to ensure both purity and informativeness in a query, avoiding the selection of OOD samples. Experiments on two public pathology image datasets show that OpenPath significantly enhances the model’s performance due to its high purity of selected samples, and outperforms several state-of-the-art open-set AL methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1566_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/HiLab-git/OpenPath

Link to the Dataset(s)

N/A

BibTex

@InProceedings{ZhoLan_OpenPath_MICCAI2025,
        author = { Zhong, Lanfeng and Liao, Xin and Zhang, Shichuan and Zhang, Shaoting and Wang, Guotai},
        title = { { OpenPath: Open-Set Active Learning for Pathology Image Classification via Pre-trained Vision-Language Models } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {489 -- 498}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The manuscript proposes two enhancements to open-set active learning (AL). Firstly, it improves the first round of querying by replacing random selection with the selection of in-distribution (ID) samples. Specifically, a GPT is used to generate class labels for out-of-distribution (OOD) classes, and then CLIP is applied to assign pseudo-labels to images based on image-text similarity; samples with pseudo-labels corresponding to ID classes are selected. Secondly, the subsequent rounds of querying are designed to consider informativeness and diversity, as well as to avoid selecting OOD samples. These are achieved using sample entropy, random splitting, and distance to ID prototypes, respectively. Experiments are conducted on two pathology datasets and show substantial improvements over compared methods in terms of test accuracy and retrieval of ID samples, along with ablation studies to demonstrate the effectiveness of each component.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. By combining several techniques from the literature, the method establishes a new state-of-the-art.
    2. The experimental design is good, including several evaluation metrics and a detailed ablation study.
    3. The paper is well-written and easy to follow.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The novelty of the proposed method is limited. The solution for the first query round is almost the same as the zero-shot OOD detection method EOE [Cao et al., 2024]. The solution for subsequent query rounds follows the two-stage active sampling framework proposed in OSAL-ND [Tang et al., 2024], where the first stage utilizes prototypes to select a candidate set for purity and the second stage refines the selection based on uncertainty.
    2. In Prototype-based ID Candidate Selection, how does the method ensure that prototypes of new classes (those not included after first query round) are effectively updated? For example, if no sample from class K is selected in the first query, a prototype for this class would be missing. Consequently, in subsequent rounds, its samples may lie far from existing prototypes and be mistakenly classified as OOD, thus excluded from the candidate set. This raise concerns about the method’s applicability, particularly in scenarios with a large number of ID classes or severe class imbalance.
    3. The effectiveness of Entropy-Guided Stochastic Sampling requires further justification and experimental support. The method aims to achieve diversity in the queried samples by using random splitting, dividing candidate samples into B batches and selecting the most uncertain samples from each batch. However, this approach seems to primarily introduce randomness among the uncertain samples and may not improve diversity. Specifically, compared to directly selecting from the entire candidate set, this strategy appears to impact the highly uncertain samples only, as the top uncertain ones will consistently be selected and the least uncertain ones will never be selected. One potential way to address this is by comparing the proposed method with: 1) randomly selecting L queries from the top 1.5L samples, and 2) fixing half of the queries as the most uncertain ones while introducing randomness for the remaining queries.
    4. The proposed method is only compared with two open-set AL methods in 2022 and 2023. This is insufficient given the existence of several methods published in 24, e.g., [Safaei et al., 2024., Yan et al., 2024, Zong et al., 2024]
    5. The method is designed without considering any specific characteristics of pathology image. It seems to be applicable to other medical images or even natural images.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    1. In the first query round, only samples with high ID scores are retained for further selection. However, [Yang et al., 2023] suggests that OOD samples can also be valuable for training an ID classifier. It would be beneficial to cite this work to provide a more balanced perspective.
    2. It would be worth citing [Liu et al., 2019], as this paper also discusses how to deal with the cold start problem in open-set AL and proposes strategies for increasing diversity in the selection process.
    3. It would be more rigorous if the standard deviation is included in Figure 2.
    4. Typo in Figure 3: w/ PIS + OOD

    References Cao, Chentao, et al. “Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection.” International Conference on Machine Learning. PMLR, 2024. Liu, Zhao-Yang, and Sheng-Jun Huang. “Active sampling for open-set classification without initial annotation.” Proceedings of the AAAI conference on artificial intelligence. Vol. 33. No. 01. 2019. Tang, Jiao, et al. “OSAL-ND: Open-Set Active Learning for Nucleus Detection.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024. Safaei, Bardia, et al. “Entropic open-set active learning.” Proceedings of the AAAI conference on artificial intelligence. Vol. 38. No. 5. 2024. Yan, Zizheng, et al. “Contrastive open-set active learning based sample selection for image classification.” IEEE Transactions on Image Processing (2024). Yang, Yang, et al. “Not all out-of-distribution data are harmful to open-set active learning.” Advances in Neural Information Processing Systems 36 (2023): 13802-13818. Zong, Chen-Chen, et al. “Bidirectional Uncertainty-Based Active Learning for Open-Set Annotation.” European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method primarily integrates several established techniques. It shows substantial performance gains, but this is achieved without comparison with recent methods. If the authors can clearly state their novel contributions, clarify how the method specifically benefits pathology images, and demonstrates competitiveness after considering recent works, then the manuscript may still be interesting to the MICCAI community.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper addresses open-set active learning for pathology image classification by proposing a VLM-based method for initial In-Distribution sample selection and introducing clustering-, prototype-, and entropy-based query strategies, achieving strong performance on two public datasets.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The VLM-based open-set initialization strategy demonstrates a degree of methodological novelty.

    • The experimental results are strong.

    • The presentation is clear and the writing is well-executed.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • The experiments were not repeated multiple times, raising concerns about reliability.

    • The paper lacks comparisons with entropy with stochastic batches [3] and diversity-based method like Core-Set [R1]. [R1] Sener, Ozan, and Silvio Savarese. “Active learning for convolutional neural networks: A core-set approach.” arXiv preprint arXiv:1708.00489 (2017).

    • For initial round selection, there are many existing cold-start active learning methods. Please refer to [R2, R3, R4] and section 4.2.1 in this survey [R5]. The authors should replace VIDS with these methods to better validate its effectiveness. [R2] Liu, Han, et al. “Colossal: A benchmark for cold-start active learning for 3d medical image segmentation.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2023. [R3] Hacohen, Guy, Avihu Dekel, and Daphna Weinshall. “Active learning on a budget: Opposite strategies suit high and low budgets.” arXiv preprint arXiv:2202.02794 (2022). [R4] Jin, Qiuye, et al. “Cold-start active learning for image classification.” Information sciences 616 (2022): 16-36. [R5] Wang, Haoran, et al. “A comprehensive survey on deep active learning in medical image analysis.” Medical Image Analysis (2024): 103201.

    • For subsequent round selection, since the main advantage of the proposed method comes from VIDS, the authors should apply VIDS to other baseline methods instead of random sampling to validate the effectiveness of PIS + EGSS.

    • What does “ODD” refer to in Fig. 3?

    • In Section 3.3, what method does “w/o PIS” refer to?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
    • Please increase the font size in Figures 2 and 3.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a solid method with clear writing and promising experimental results. However, the experimental setup lacks key comparisons, especially regarding the initial and subsequent sampling rounds. I recommend weak accept, with the expectation that the authors will address these issues in their revision to strengthen the paper’s overall impact.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper introduces OpenPath, a novel open-set active learning framework for pathology image classification. Its main contributions are: (1) a VLM-based warm-up strategy that leverages GPT-4 to generate task-specific OOD prompts for selecting high-purity ID samples in the first round, and (2) a two-stage sampling strategy (PIS + EGSS) in subsequent rounds to ensure both informativeness and diversity. The method outperforms existing open-set AL baselines on two public medical datasets and includes ablation studies to support its components.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper addresses a realistic and underexplored problem: open-set active learning in pathology image classification, where unlabeled data may include both in-distribution (ID) and out-of-distribution (OOD) samples — especially relevant for histopathology, where each slide contains many patches that may belong to irrelevant (OOD) tissue types.

    2. The VLM-based warm-up strategy is a novel contribution: it leverages GPT-4 to generate task-specific OOD prompts, enabling zero-shot inference with a pretrained vision-language model (BioMedCLIP) to identify high-purity ID samples in the first query round. This effectively tackles the cold-start problem.

    3. The paper includes ablations isolating the impact of each component (warm-up, PIS, EGSS), which supports the validity of the proposed design.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. While the overall method is well-designed, the DIS strategy (PIS + EGSS) is composed of relatively standard components. Prototype-based filtering is commonly used in OOD detection, and EGSS builds on batch-wise entropy selection from prior work [3]. The novelty here is incremental, and stronger theoretical or empirical comparison to alternative sampling strategies would strengthen the contribution

    2. Additionally, for the proposed EGSS, it would be helpful to provide more intuitive or theoretical justification for how the proposed batch-wise entropy selection promotes diversity beyond standard uncertainty sampling.

    3. The text encoder is not used after the first round. Is there a reason for not freezing it and using prompt-based classification (e.g., cosine similarity to text embeddings) instead of training a separate classifier head? This way, text prompts could still be used to help distinguish ID from OOD samples.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Thank you for the submission. The proposed method is clearly described and addresses a relevant problem in pathology image classification. The use of a VLM-based warm-up phase is an interesting approach to tackle the open-set cold-start issue.

    That said, the paper could benefit from further justification of the EGSS strategy—particularly how batch-wise entropy selection contributes to diversity beyond standard uncertainty sampling. Additionally, while results on CRC100K and SkinTissue are promising, the generalizability of the method would be clearer if evaluated across additional datasets or modalities.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper addresses an interesting and practical problem—open-set active learning for pathology image classification—by combining a vision-language model with GPT-4-generated prompts to improve sample selection in the first round. The method is clearly described and demonstrates strong results on two datasets. While some components are based on existing techniques, the overall approach is effective and well-supported. I believe the paper makes a good contribution and recommend acceptance.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

R1.1 EOE is designed for OOD detection, while our proposed first-stage strategy aims to address the issue that traditional active learning tends to select too many OOD samples in open-set scenarios, in order to obtain more target samples for labeling. In the second stage, the main difference between our method and OSAL-ND lies in the design of uncertainty. We use Entropy-Guided Stochastic Sampling to ensure both informativeness and representativeness. R1.2 If the first query does not select any target class samples, then subsequent selection strategies might indeed fail to capture these missing classes. However, in our first-stage selection process, we used clustering to ensure representativeness, and the number of queries is significantly larger than the number of classes. Under different random seeds, the samples selected by our method in the first query are able to cover all the classes. R1.3 Randomly splitting candidates into batches can increase randomness and diversity, which has already been confirmed in [3]. In subsequent versions, we will address the effectiveness of EGSS as per your request. [3] Gaillochet, M., Desrosiers, C., Lombaert, H.: Active learning for medical image segmentation with stochastic batches. Medical Image Analysis 90, 102958 (2023) R1.4 In future versions, we will compare our method with more recent open-set approaches. R1.5 In future versions, we will propose strategies specifically tailored for pathological images to enhance their performance.

R2.1 Regarding [3], we would like to clarify that our strategy differs from theirs in that it is designed to select more informative samples at the instance-wise level. While [3] focuses on batch-wise entropy selection, our approach enhances this by incorporating additional mechanisms that prioritize selecting samples with higher informativeness at the instance level, thus improving the efficiency of the sampling process. We believe the novelty lies in the integration of these components in the specific context of our task and the proposed enhancements to handle open-set challenges effectively. R2.2 EGSS divides candidates into random batches, which increases the diversity of selection. This step is important because standard entropy selects samples with high entropy, which may lead to selecting similar samples, causing redundancy. R2.3 We also tried updating the original VLM, but experimental results showed that the performance was not as good as retraining the model from scratch. One possible reason is that the number of labeled samples is not small, and training a fully supervised network yields better results than methods like PEFT.

R3.1 In the paper, we mentioned: “We repeated the experiments five times with different random seeds and reported the average results for all methods.” R3.2 In future versions, we will refine the comparison section and compare with the methods you mentioned. R3.3 We compared different cold-start active learning methods, and the results highlighted the advantages of our approach.




Meta-Review

Meta-review #1

  • Your recommendation

    Provisional Accept

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A



back to top