Abstract

Deep learning models have exhibited remarkable efficacy in accurately delineating the prostate for diagnosis and treatment of prostate diseases, but challenges persist in achieving robust generalization across different medical centers. Source-free Domain Adaptation (SFDA) is a promising technique to adapt deep segmentation models to address privacy and security concerns while reducing domain shifts between source and target domains. However, recent literature indicates that the performance of SFDA remains far from satisfactory due to unpredictable domain gaps. Annotating a few target domain samples is acceptable, as it can lead to significant performance improvement with a low annotation cost. Nevertheless, due to extremely limited annotation budgets, careful consideration is needed in selecting samples for annotation. Inspired by this, our goal is to develop Active Source-free Domain Adaptation (ASFDA) for medical image segmentation. Specifically, we propose a novel Uncertainty-guided Tiered Self-training (UGTST) framework, consisting of efficient active sample selection via entropy-based primary local peak filtering to aggregate global uncertainty and diversity-aware redundancy filter, coupled with a tiered self-learning strategy, achieves stable domain adaptation. Experimental results on cross-center prostate MRI segmentation datasets revealed that our method yielded marked advancements, with a mere 5% annotation, exhibiting an average Dice score enhancement of 9.78% and 7.58% in two target domains compared with state-of-the-art methods, on par with fully supervised learning. Code is available at: https://github.com/HiLab-git/UGTST.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1781_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Luo_An_MICCAI2024,
        author = { Luo, Zihao and Luo, Xiangde and Gao, Zijun and Wang, Guotai},
        title = { { An Uncertainty-guided Tiered Self-training Framework for Active Source-free Domain Adaptation in Prostate Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposed a method to select images to label during domain adaptation based on uncertainty. The uncertainty definition is adapted to medical images where background is dominant.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed methods did largely surpass the compared baselines and reaching similar performance as fully supervised methods.

    The ablation studies proved the benefits of including the proposed components.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The dataset used is too small, with only 60 or 12 MRI slices. There exists larger dataset for prostate or other organs. This largely limited the value of the paper.

    What is M_s in section 2?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Consider a larger dataset would increase the impact of this paper.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Main concern is the small datasets used, potentially reducing the value of the proposed method. Without code release, the reproduction is impossible.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Despite the small dataset the paper remains strong in terms of results. The authors promised releasing the code, which could help on reproducibility.



Review #2

  • Please describe the contribution of the paper

    (1) This paper proposed a novel and efficient Active Source-free Domain Adaptation (ASFDA) framework called Uncertainty-guided Tiered Self-training (UGTST) for prostate segmentation tasks. (2) This paper proposed the first global uncertainty estimation method for active sample selection in medical image segmentation. (3) This paper proposed a practical DA strategy Tiered Self-training (TST).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) This paper proposed a novel uncertainty estimation technique for active sample selection in medical image segmentation tasks. It utilizes entropy-based minimum local peak filtering to aggregate global uncertainty, along with diversity-aware redundancy filters, thus constituting the active sample selection approach. (2) The authors designed the tiered self-training DA strategy, stabilizing the active learning while progressively leveraging pseudo labels. (3) Comparative and ablation experiments are sufficient to verify the effectiveness of the method and its various components.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The authors mentioned in section 2.1 that “In the output of prostate segmentation, the majority of pixels are confidently classified as background”, the relationship between this and the design of minimum local peak value Ti is not clearly described. Please give a more detailed explanation. (2) How other uncertainty estimation methods in ablation experiments are specifically applied to the proposed method is not explained in detail.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    (1) A more specific description of the relationship between motivation and method design. (2) In the upper part of Fig.1, “Adapting Stage-1 Model with 𝑫𝒕𝒔 ∩ 𝑫𝒕𝒂” should be revised to “Adapting Stage-1 Model with 𝑫𝒕𝒔 ∪ 𝑫𝒕𝒂”.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method works but the description needs to be more specific.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper
    1. The authors have proposed a novel uncertainty based active source free domain adaptation method for medical image segmentation.
    2. Leveraged tiered self training framework to further enhance the performance.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Proposed active learning approach seems to result in high performance difference compared to baselines.
    2. Performed comprehensive experiments (ablation studies) to highlight each components of their proposed methods performance.
    3. Global aleatoric uncertainty aggregation combined with diversity - aware redundancy filtering provides a novel way of selecting subsamples from the target domain for manual annotations.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The overall method section needs significant improvement interms of clarity:

      a. The notations mentioned in the method section should be more clearer and understandable. For example in section 2 the notation Ms is mentioned without any prior explanation. b. The section 2.2 doesn’t give a clear picture of the full tiered self training strategy. c. The figure 1 could be more clearer interms of final input and output of the target domain.

    2. It would have been good to know where the proposed model is performing the worst (like picking any bias information from the data etc).

    3. Motivation for fixing the labeling budget as 5% is not clear from the manuscript. It would have been interesting if increasing the labeling budget would result in significant performance difference.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It would be nice to include the pseudo code for tiered self training framework.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Prostate has relatively simpler shape, it would be interesting to see active learning approach works for more complicated structures as it showed promising results for the former. Testing the model on different organ of similar shape of prostate can also be explored as part of the proposed methodology.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Authors proposed a novel approach for source free domain adaptation through active learning strategy. The contribution is novel enough to be accepted but there is still room for improvement as discussed above.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank all reviewers for their valuable and insightful reviews. They described our method as “novel and efficient” (R1&R4), with comprehensive experiments (R1&R4) and largely surpassed the compared methods (R3). Here we address their main concerns:

  • Motivation of Ti (R1) Due to the imbalance between foreground and background, taking an average of pixel-level uncertainty across the image will be biased to the background. To obtain an unbiased global uncertainty estimation, we use an entropy threshold (Ti) to identify the uncertain region, and aggregate the uncertainty in this region as the slice-level uncertainty, which is more relevant to the prostate. To avoid manual tuning the value of Ti, we propose a self-adaptive threshold based on the histogram of pixel-level uncertainty.

  • Dataset (R3&4) Collecting a large dataset is often challenging, and domain adaptation methods aim at transferring knowledge from a larger source dataset to smaller target datasets. We followed existing works of Yang et al. [27] to use the prostate dataset for SFDA, where a larger source domain and two smaller target domains were involved. Due to the relatively small target domain, 4-fold cross-validation was performed to ensure robust evaluation. Though the prostate has a relatively simpler shape, the different domains of the prostate dataset differs a lot in image brightness, contrast, quality and resolution, posing challenges in cross-center adaptation settings. As shown in Table 1, the Dice of “source-only” was 42% to 45%, showing the large domain gaps. As this dataset has been widely used as a benchmark for SFDA, we used it for better demonstrating the improvement from existing methods. We agree with the reviewer that other datasets with more complicated shapes could be used, which will be investigated in an extension of this work.

  • Clarity and notation of the method (R1,3&4) M_s in section 2 actually means the source model used for target-domain initialization The TST strategy is straightforward: 1) a model initialized with the source model is trained on active samples with manual annotation and remaining samples with pseudo labels, and then it predicts updated pseudo labels for unannotated samples. 2) The process is repeated once with the updated pseudo labels. Section 2.2 could be slightly revised to make this clarified. As described in the first paragraph of the method section, the input of our method is the source model and unannotated samples in the target domain, and the output of our method is the selected active samples and the adapted model. Fig. 1 can be edited a little bit to clarify this. The mistake in the notation in the upper part of Fig. 1 will also be corrected with minor modification.

  • Annotation budget (R4) A low annotation budget is expected to keep annotation minimal, and Table 1 showed that only our method is comparable to fully supervised learning or finetuning in the target domain with only 5% images labeled. So, we did not further increase the annotation budget. In the future, it is of interest to further reducing the annotation budget.

  • Details for Fig.3(b) (R1) For the compared MC-Dropout, LC and Entropy for uncertainty estimation, we followed typical practice of averaging the uncertainty across all the pixels to obtain image-level uncertainty, which was then used for active sample selection.

  • Pseudo-code and code (R3&4) Due to the space limit, we did not put pseudo-code in the manuscript, as done in most MICCAI papers. The code was not provided due to the anonymous reviewing process, but will be released in the final version, ensuring the reproducibility of the work.

  • Showing the worst case (R4)
    We agree that more analysis on where our method performed the worst would help better understanding its limitation. Fig.2 showed some average cases for comparison, and it can be slightly edited by showing some failure cases for more informative presentation in a final version.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Three reviewers consistently scored ‘weak accept’ after the rebuttal. This paper has some merits by combining uncertainty estimation for active learning and self-training. It is a bit overclaimed by saying ‘the first global uncertainty…’ because there are some related published works, e.g. [1] (also cited by the authors as [24]).

    Considering the comments from three reviewers, I vote for accept for this submission, but it is weak accept from my perspective.

    [1] https://arxiv.org/pdf/2309.10244

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Three reviewers consistently scored ‘weak accept’ after the rebuttal. This paper has some merits by combining uncertainty estimation for active learning and self-training. It is a bit overclaimed by saying ‘the first global uncertainty…’ because there are some related published works, e.g. [1] (also cited by the authors as [24]).

    Considering the comments from three reviewers, I vote for accept for this submission, but it is weak accept from my perspective.

    [1] https://arxiv.org/pdf/2309.10244



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top