Abstract

Deep active learning (AL) is commonly used to reduce labeling costs in medical image analysis. Deep learning (DL) models typically exhibit a preference for learning from easy data and simple patterns before they learn from complex ones. However, existing AL methods often employ a fixed query strategy for sample selection, which may cause the model to focus too closely on challenging-to-classify data. The result is a deceleration of the convergence of DL models and an increase in the amount of labeled data required to train them. To address this issue, we propose a novel Adaptive Curriculum Query Strategy for AL in Medical Image Classification. During the training phase, our strategy leverages Curriculum Learning principles to initially prioritize the selection of a diverse range of samples to cover various difficulty levels, facilitating rapid model convergence. Once the distribution of the selected samples closely matches that of the entire dataset, the query strategy shifts its focus towards difficult-to-classify data based on uncertainty. This novel approach enables the model to achieve superior performance with fewer labeled samples. We perform extensive experiments demonstrating that our model not only requires fewer labeled samples but outperforms state-of-the-art models in terms of efficiency and effectiveness. The code is publicly available at https://github.com/HelenMa9998/Easy_hard_AL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1423_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/HelenMa9998/Easy_hard_AL

Link to the Dataset(s)

https://www.adcis.net/en/third-party/messidor/ https://iciar2018-challenge.grand-challenge.org/Dataset/

BibTex

@InProceedings{Ma_Adaptive_MICCAI2024,
        author = { Ma, Siteng and Du, Honghui and Curran, Kathleen M. and Lawlor, Aonghus and Dong, Ruihai},
        title = { { Adaptive Curriculum Query Strategy for Active Learning in Medical Image Classification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper provides an active learning strategy for image classification that automatically adapts its selection strategy from diversity-based to uncertainty-based based on the AL iteration.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper addresses an important issue with AL approches: the fact that they are fixed and do not adapt to the learning phases of the model.

    • The results are averaged over several runs

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) Literature review: I think references to existing hybrid AL for classification strategies are missing. For example:

    • Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds, Ash et al. ICLR (2020)
    • (Cost-Effective Active Learning for Deep Image Classification. Wang et al. IEEE Transactions on Circuits and Systems for Video Technology (2017)

    Similarly, very few approaches on AL are mentioned. More works on uncertainty-based and diversity-based AL strategies could be cited:

    • Learning Loss for Active Learning. Yoo et al. CVPR (2019)
    • The Power of Ensembles for Active Learning in Image Classification. Beluch et al. CVPR (2018)
    • Variational Adversarial Active Learning. Sinha et al. ICCV (2019)

    2) Previous works have also addressed the issue of adapting the AL strategy used and have not been cited:

    • How to Select Which Active Learning Strategy is Best Suited for Your Specific Problem and Budget. Hacohen et al. NeurIPS (2023)
    • Algorithm Selection for Deep Active Learning with Imbalanced Datasets. Zhang et al. NeurIPS (2023) Hence, this paper lacks in novelty in that it is not the first that proposes “a method to dynamically select an AL strategy, which takes into account the unique characteristics of the problem and the available budget.”

    3) The paper claims to adapt its selection strategy from diversity-based to uncertainty-based based on the AL iteration. However, advice on the specific type of diversity and uncertainty measures to use is limited (and conclusions are unclear from the result section). Since all the best combinations are shown in Table 1 and are different for the two datasets, it seems that there is no way to decide which one to use, aside from testing all diversity/uncertainty combinations and selecting the best one. This makes ACAL sub-optimal in practice.

    4) It is unclear from the text whether the optimal combination was obtained by looking at validation or test results.

    5) The results do not show a consistent behaviour and do not fully support the claim that ACAL yields better performance than other AL methods. For instance, not one but 6 ACAL strategies are presented (one per uncertainty measure), and they are not consistently better than random sampling across different labeling percentages. For example, over 5 different labeling percentages in BraTS (Table 1), ACAL with random diversity-based strategy is worse or equal to random sampling strategy 2 to 4 times.

    6) The authors’ claim of faster convergence with ACAL is not clear from the results, especially compared to random sampling in Figure 2.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) To show the benefits of ACAL, it might be better to do pairwise comparisons. For example, given a diversity-based strategy, show results for that diversity-based strategy VS ACAL with same diversity-based strategy and different uncertainty-based strategies. And vice-versa, keeping the uncertainty-based strategy fixed and varying ACAL’s diversity-based AL method.

    2) Since ACAL with random sampling obtains the optimal performance (with given uncertainty-based strategy) in many of the BraTS experiments and is also the strategy shown in Figure 2, it would be worth comparing the results with the stochastic batch strategy presented in “Active learning for medical image segmentation with stochastic batches” by Gaillochet et al (MedIA, 2023), which combines random sampling and uncertainty-base sampling.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty is limited, given existing previous approaches of adaptive active learning and combination of random and uncertainty-based sampling (most highlighted ACAL combination). Also, the results are not very convincing, especially compared to random sampling (ie: Fig. 2, ACAL VS Rand)

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have addressed most of my comments, including on novelty.



Review #2

  • Please describe the contribution of the paper

    This work introduces an adaptive querying algorithm to boost data efficiency and performance of deep learning models for medical image classification tasks. The key contribution of this work lies in the adaptive querying strategy which balances diversity vs difficulty of training samples to maximize labeling efficiency.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Strength:

    1. This work leverages popular ideas from curriculum learning and boosting to design efficient training algorithm for medical image classifiers.
    2. The authors demonstrate improvements in performance over other state of the art AL approaches which utilize a fixed querying strategy
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper lacks an appropriate discussion of how their approach differs from prior published works describing related concepts (for e.g: self paced learning. See: https://papers.nips.cc/paper/3923-self-paced-learning-for-latent-variable-models, https://papers.nips.cc/paper/5568-self-paced-learning-with-diversity)

    2. A good pretrained image encoder can also significantly reduce the amount of labeled data needed for achieving good performance for specific downstream tasks. A comparative analysis demonstrating the effectiveness of their proposed AL approach for various pretrained encoders including recent state of the art self supervised image encoders (for e.g MedSAM for MRI and CTransPath for Pathology) would have been helpful.

    3. One limitation not discussed in this work is how the algorithm handles the issue of inter-observer variability commonly observed in many medical imaging tasks.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    no additional comments

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • the captions of Figure 3B are difficult to follow. Based on the plots it seems the authors are trying to show how the querying strategy shifts from diversity based to uncertainty based.

    • it is not clear how uncertainty in model predictions was quantified. Having some explanation of how uncertainty was quantified would be helpful

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall the authors propose a reasonable approach leveraging active learning to maximize performance of deep learning based medical imaging classification models on a fixed/limited budget of labeled datasets. Obtaining labeled data in pathology and radiology is expensive necessitating the need for clever training strategies to build robust and generalizable medical imaging AI models. This paper could benefit from further discussion of how the ideas proposed in this work build upon/differ from related ideas in the field (e.g self paced learning) and recently emerging trends of self supervised foundation models.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have promised to add additional discussion comparing the novelty of their approach vs previously published methods in this area and also discuss about inter-observer variability in the revised version.



Review #3

  • Please describe the contribution of the paper

    The paper introduces an active learning method capable of dynamically adjusting its query strategy. Initially, it performs diversity sampling and switches to an uncertainty-based strategy when the distance between the labeled dataset and the data pool falls below a certain threshold.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The approach is novel, using a simple algorithm to address the problem of early and late phases of active learning suits different sampling strategies, which could benefit the subsequent development of active learning.
    • The experimental setup is solid, and the choice to evaluate different methods using average rank is interesting.
    • The paper is well-written, clear, and easy to understand.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • As a representative of hybrid active learning, the paper lacks a comparison with BADGE[1].
    • Despite the impressive average rank achieved by the method, there is no significant absolute performance advantage over other methods. [1] Ash, Jordan T., et al. “Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds.” International Conference on Learning Representations. 2020.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    In the README file of the source code, it says that the code is for “Adaptive Adversarial Samples based Active Learning for Medical Image Classification”, which is not the title of this paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The current version of ACAL still requires initial labeling. Progress in cold-start active learning could further enhance the performance of ACAL. For a comprehensive review of cold-start active learning, refer to section 4.2.1 of paper [2].
    • Measuring the distance between two distributions using JSD may not be optimal. Please refer to [3] to explore the potential of other distance metrics.
    • The idea presented in this paper is good, and validation on larger datasets may be necessary in future work. 2] Wang, Haoran, et al. “A comprehensive survey on deep active learning in medical image analysis.” arXiv preprint arXiv:2310.14230 (2023). [3] Zhao, Shengjia, et al. “Comparing distributions by measuring differences that affect decision making.” International Conference on Learning Representations. 2022.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a novel active learning method that dynamically adjusts its sampling strategy based on the distance between the data pool and the labeled dataset. The method is innovative, and its concept could pave a new path for the future development of active learning. The experimental results presented in the paper demonstrate the effectiveness of the method. I recommend accepting the paper.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I keep my rate.




Author Feedback

Reviewer 1: Thanks for the insightful comments. 1: Lacks discussion on differences from self-paced learning. R: We will include the discussion about self-paced learning in the final version.

2: A comparative analysis of the proposed AL approach with different pretrained encoders would be helpful. R: Currently, we used a widely-used lightweight U-Net, with pretrained weights from ImageNet to facilitate faster interaction with annotators. Using self-supervised pretrained encoders can be investigated as future work, which we will discuss in the final version.

3: Lacks discussion of inter-observer variability. R: We recognize that inter-observer variability relates more to inexact supervision and handling noisy label than reducing labeling costs - the primary goal of our paper. However, it is an interesting point for future exploring.

Reviewer 3: Thanks for the insightful comments and the recognition of the potential influence of our paper. The suggestion to include an additional baseline and explore other distance metrics provide further insights to boost our work.

Reviewer 4: Thanks for the detailed comments. But we have some different opinions in some points. 1: Missing some references. R: Given the space constraints, we mainly focus on reviewing SOTA AL methods in medical areas rather than all AL methods in general.

2: Previous works on adapting AL strategy have not been cited. Thus, this paper lacks novelty. R: We believe our paper differs significantly from the papers mentioned in the comments in terms of novelty, motivation, tasks, and approach. Our paper aims to adaptively switch AL query strategies during the learning phase based on the learning pattern of the DL model (e.g., querying easy examples first and harder ones later). In contrast, the paper Hacohen et al. (2023) focuses on selecting a proper AL method from a set of candidates for a given dataset based on different budgets for each dataset. This is more similar to AutoML method and is not a scenario considered in our domain of medical imaging. Paper Zhang et al. (2023) addresses how to automatically select an appropriate AL method from a set of candidates for imbalanced settings. Again the focus on imbalance and not a scenario applicable to us. So our novelty is completely different from these works.

3: Cannot find a fixed combination that always performs best from the experiments. R: It is impractical to use a single combination of AL strategies to fit all datasets. It is incorrect, therefore to conclude from Table 1 that ACAL is sub-optimal in practice. Moreover, our studies on various datasets and base methods show good generalizability, suggesting that it would yield good results when applied in practice.

4: Unclear if the optimal combination was determined from validation or test results. R: In Table 1, the optimal combination selection is based on test results.

5: Not all ACAL consistently outperform other AL methods. R: ACAL allows the use of different AL methods as our base query strategy. We present six combinations to show how ACAL can improve the performance of different base methods. As noted in reviewer’s comment, “ACAL with a random diversity-based strategy is worse or equal to random sampling 2 to 4 times.” This is because the base method is too weak, as shown in Table 1. Comparing ACAL with these weak strategies and the strategies alone shows a significant performance increase (e.g., LC only: 0.7, ACAL (Rand|LC): 0.860). ACAL with stronger base strategies always performs best under different labeling percentages.

6: Faster convergence claim is unclear, especially compared to random sampling in Fig 2. R: In the early training stages, ACAL employs a diversity-based strategy (e.g., Rand), leading to similar performance to random sampling. Its advantage becomes evident later when it shifts to an uncertainty-based strategy, targeting challenging samples for quicker convergence and better performance (e.g., behind the dotted line in Fig 2).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The authors have addressed most concern of the reviewers.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The authors have addressed most concern of the reviewers.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    All reviewers are satisfied with the contributions and the rebuttal

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    All reviewers are satisfied with the contributions and the rebuttal



back to top