Abstract

Deep learning-based diagnostic models often suffer performance drops due to distribution shifts between training (source) and test (target) domains. Collecting and labeling sufficient target domain data for model retraining represents an optimal solution, yet is limited by time and scarce resources. Active learning (AL) offers an efficient approach to reduce annotation costs while maintaining performance, but struggles to handle the challenge posed by distribution variations across different datasets. In this study, we propose a novel unsupervised Active learning framework for Domain Adaptation, named ADAptation, designed to select only a few informative samples from multi-domain data pools. As a fundamental step, our method first utilizes the distribution homogenization capabilities of diffusion models to bridge cross-dataset gaps by translating target images into source-domain style. We then introduce two key innovations: (a) a hypersphere-constrained contrastive learning network for compact feature clustering, and (b) a dual-scoring mechanism that quantifies and balances sample uncertainty and representativeness. Extensive experiments on four breast ultrasound datasets (three public and one in-house/multi-center) across five common deep classifiers demonstrate that our method surpasses existing strong AL-based competitors, validating its effectiveness and generalization for clinical domain adaptation. The code is available at https://github.com/miccai25-966/ADAptation.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0966_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/miccai25-966/ADAptation

Link to the Dataset(s)

BUSI dataset: https://www.kaggle.com/datasets/sabahesaraki/breast-ultrasound-images-dataset BUS-BRA dataset: https://zenodo.org/records/8231412 UDIAT dataset: https://www.nature.com/articles/s41597-025-04562-3

BibTex

@InProceedings{DuaYao_ADAptation_MICCAI2025,
        author = { Duan, Yaofei and Huang, Yuhao and Yang, Xin and Han, Luyi and Xie, Xinyu and Zhu, Zhiyuan and He, Ping and Chan, Ka-Hou and Cui, Ligang and Im, Sio-Kei and Ni, Dong and Tan, Tao},
        title = { { ADAptation: Reconstruction-based Unsupervised Active Learning for Breast Ultrasound Diagnosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15975},
        month = {September},
        page = {33 -- 43}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    Authors propose a novel unsupervised Active learning framework for Domain Adaptation, named ADAptation, which efficiently selects informative samples from multi-domain data pools under limited annotation budget It uses a diffusion model to translate “target” images into a “source” domain Two claimed innovations: (a) contrastive learning with a hypersphere- constraint for compact feature clustering, and (b) a dual-scoring mechanism that quantifies and balances sample uncertainty and representativeness.

    Use-case: 4 breast ultrasound datasets (three public and one in-house/multi-center). Select most informative cases from the unlabeled data to annotate for Active learning of a classifier.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Methods: Complex architecture with interesting combination of generative image modeling for domain transfer, encoding of image features and measure of image “interest” for labeling

    The Informative score seems original. But I could not understand why argmax |\theta_p-\theta_q| is used rather than argmin if as stated “smaller value indicates higher uncertainty” and It seems that in Figure 4, the proposed approach lead to “emphasizing boundary samples in the target distribution (Uncertainty),”. Detailed comparison to benchmark Active learning sample selection methods (Max-entropy, BALD, LfSOSA, CoreSet, VAAL) and for various annotation budgets.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Writing: Similarity scores computed in Figure 1 are not explained. Formula? Equations and mathematical notations are not rigorously introduced and manipulated: Equation 1: epsilon variable not defined + t is used to index two different things: target domain and time step. Equation 2: notation x^(u) and x^(r) not introduced + typo in l2 norm for f()

    Methods: No justification on need to use a prompt (CLIP like encoder). No examples of variety of prompts used. Suspect only one prompt is used.

    For the contrastive learning with student/teacher networks, the loss in Eq 3 uses directly an angular value which is unusual and must raise issues with angle sign and periodicity. Why such choice? Tuning of the adaptive scaling factor m is not discussed. Adaptive wrt what? In what sense does this approach provide a “geometry-aware representation”?

    Results Several hyperparameters (eg. m, w) not documented with values in the Results (?). Reasons and details on using other architectures than ResNet-50 as the backbone (cf Table 2) are not provided in the text. Only Accuracy is reported, which is a limited view on the classification performance. More problematic is that no indication on class balance and stratification is provided for the performed experiments while datasets are highly imbalanced.

    Additional comments: Canny edge => canny edge map 255-dimensional hypersphere => 256-dimensional rather?

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Two much rewriting is required to make the paper acceptable in terms of rigorous methodological contributions and precise details on experiments. Also experiments lacked essential setup details to be evaluated.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    The rebuttal has clarified points but confirms the following concerns on my side: (1) Value of the BiomedCLIP prompt: not demonstrated and not used in this work, as clarified in the rebuttal. I agree though with the high potential value for future work to extend to multiple use cases. (2) The numerical value of the used arcos metric is now clarified to be limited in range [0 pi]. But quickly checking on mentioned Ref1 and Ref2 I couldn’t see similar use of an arcos metric but rather of cosine metrics. (3) The scaling factor m is now clarified for initialization but initial and final values seem to be the same (m=4) based on rebuttal. What if initializing with a different value? How critical is this parameter? (4) Authors clarified data imbalance as “the class distribution across all datasets is roughly 2:1 or 1:2 (benign vs. malignant)” + “not considered as highly imbalanced”. I don’t understand the 2:1 or 1:2 distinction and disagree with their opinion as this seems highly imbalanced to me. This is a major concern for me.



Review #2

  • Please describe the contribution of the paper

    The authors propose a framework for automatically selecting data samples for annotation from a multi-domain data pool. The framework consists of three stages: fine-tuning a diffusion model, generating reconstructed target data, and using contrastive learning for sample selection. The proposed method is evaluated on four breast ultrasound datasets and compared against existing approaches.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The figures in the paper are clear and informative.
    • The objective of this work holds practical significance in clinical applications, where dealing with new, large-scale, labour-intensive annotation is a common problem.
    • This work shows extensive evaluations of the proposed method, and statistical tests were performed to show statistical significance of the results.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Page 3, fig. 1 and page 4 Sect. 2.1, CLIP was replaced with BiomedCLIP, and the prompt “Ultrasound of breast” was used; however, the effectiveness of this prompt was not clearly explained nor provided with further proofs.
    • The authors claimed one of the innovations was hypersphere-constrained contrastive learning, however, failed to explain how it is different from a similar existing work “Wang, Tongzhou, and Phillip Isola. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. International conference on machine learning. PMLR, 2020.”
    • Page 3, Sect. 2, please give further clarification regarding this statement “This design addresses clinical needs where diagnostic models require rapid updates with new data.” Suppose given a trained, approved diagnostic model, it should produce satisfying diagnosis results, why would it need to be constantly updated with new data?
    • Page 4, Sect. 2.2, “m represents the adaptive scaling factor.” How is m calculated?
    • The usage of \alpha is somehow confusing. On page 3, Sect.2, “ADAptation aims to select the top \alpha% most informative samples…”, “a sphere-based rule quantifies informativeness to select Top-\alpha samples for annotation.” On page 5, Sect.2.3, “\alpha ∈ (0, 1) represents the selection ratio,…” However on page 6, the suggested rations are “with varying annotation budgets (20%, 30%, 50%, 80%)…”. Please be consistent when using \alpha or \alpha%, which are not interchangeable.
    • Page 6, Tab.2, some of the results using ADAptation with 80% annotation even exceed when using 100% annotation, (ResNet-50 and MobileNet). Do these results hold statistical significance and why do models trained with partial annotation outperform those with full annotation?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method aims to address a practical problem, and the experiments were extensive, however its acceptance may depend on further clarifications of the aforementioned points.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors have addressed my concerns.



Review #3

  • Please describe the contribution of the paper

    The authors propose a novel method for unsupervised active learning in domain adaptation, named ADAptation. The core contribution is a framework that efficiently selects informative samples from multi-domain data pools under a limited annotation budget.

    The method begins by using diffusion models to translate source domain images into the style of the target domain. This is followed by the introduction of a hypersphere-constrained contrastive learning network, which encourages more compact and discriminative feature clustering. Additionally, the authors propose a dual-scoring mechanism that balances both uncertainty and representativeness when selecting samples for annotation, improving the efficiency of the active learning process.

    The method is thoroughly evaluated on two public target datasets and one in-house target dataset, demonstrating its effectiveness and robustness across different settings.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This is a well-presented and thoughtfully designed study, and I genuinely enjoyed reading it.

    While the use of diffusion models for domain translation is not entirely new, the authors introduce a clever extension by incorporating a Canny edge detector into the reconstruction process. This helps ensure that essential structural information in the medical images is preserved during translation. The integration of text prompts further enriches the translation process, potentially enabling more controlled and semantically meaningful transformations across domains.

    Another notable strength is the use of a hypersphere-constrained contrastive learning framework for feature clustering. This approach encourages more compact and well-separated representations.

    The experimental evaluation is especially strong, covering a wide range of scenarios. The authors vary the annotation budget from 20% to 80%, evaluate across five different classification architectures, and compare with a comprehensive set of state-of-the-art baselines. The inclusion of a thorough ablation study further supports the method’s robustness and helps isolate the contribution of each component.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    There are no major weaknesses from my perspective. However, one aspect that could benefit from further clarification is the use of prompt encoding in the image translation step. While it adds flexibility, it is not entirely clear why text prompts are needed in this context, especially since the task does not appear to be inherently text-driven. A brief explanation of the motivation and role of prompt-based conditioning in this specific application would strengthen the paper and help readers better understand its value.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (6) Strong Accept — must be accepted due to excellence

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is well-written and well-organized, making it easy to follow the motivation, methodology, and results. The method is thoughtfully designed, with several innovative components that are well-integrated and clearly explained. Additionally, the experimental evaluation is thorough and convincing, covering a wide range of scenarios, model architectures, annotation budgets, and comparisons with strong baselines. These factors collectively support the paper’s main claims and demonstrate its potential impact.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank all reviewers (R) for their feedback. We have clarified the concerns and will improve the writing accordingly.

Q1. Prompt and CLIP. (R1,R3,R4) We thank the reviewers for the concerns. While our task is not inherently text-driven, we leverage textual semantic guidance to enhance generation quality. We chose BiomedCLIP (trained on 15 million medical data pairs) over the original CLIP from the diffusion model for its strong medical semantic understanding (6.7%-12.4% PSNR improvement). The domain-specific prompt (‘Ultrasound of breast’) serves as a prior semantic anchor to align image features from Canny edge map with relevant medical concepts. This design also enables future extensions to multiple organs/modalities without changing model structure.

Q2. Adaptive scaling factor. (R1,R4) The scaling factor m is trainable, initialized as nn.Parameter(m=4) with F.softplus(), ensuring positivity and smooth gradients.

Q3. Angle sign and periodicity. (R1) We clarify the misunderstanding in Eq.3. Since arccos() outputs values strictly within [0, pi], the result is explicit and monotonic. Unlike low-dimensional spaces (2D/3D) where angles have sign or periodicity (e.g., 0=360), our method in a high-dimensional normalized space (255-D hypersphere), where feature vectors projected to corresponding direction vectors (shape [1, 256]). This avoids angular ambiguity, improves clustering (evidenced by Table 3), and is supported by recent works [Ref1, Ref2].

Q4. Table 2: Other architectures as backbone. (R1) We clarify that the architectures in Table 2 are downstream models, not backbones, as stated in Page 6 line 4. This detail may have been overlooked.

Q5. Accuracy metrics. (R1) In Table 1, the class distribution across all datasets is roughly 2:1 or 1:2 (benign vs. malignant), not considered as highly imbalanced. Therefore, accuracy remains reasonable and meaningful. Precision, Recall, F1, AUC were also evaluated; however, only accuracy is reported due to space limits. All metrics will be included in the journal version.

Q6. Differences between ours and Wang’s. (R4) Our work differs from Wang’s in two key aspects:

  1. Our scenario poses a unique challenge as the model must learn representations solely from positive pairs, compared to Wang leveraged both positive and negative pairs.
  2. We propose geometric constraint with arccos-based consistency loss for feature alignment, as proved in Table 3, unlike Wang’s InfoNCE-based contrastive loss.

Q7. Clarify ‘models need update with new data’. (R4) In clinical deployment, domain shifts (e.g., across hospitals, imaging protocols or populations) often degrade model performance. Updates are essential to maintain diagnostic accuracy across centers.

Q8. ADAptation with 80% label exceeds 100%. (R4) We noticed this as a result of our active domain adaptation strategy that prioritizes informativeness. In contrast, full supervised treats all data equally (including noisy), potentially degrades performance under domain shift. Similar findings have been reported in prior study [Ref3]. And our results are statistically significant:

  • ResNet-50 1.Full supervised: 0.9229 (CI:[0.8974, 0.9487]) 2.Ours (80%): 0.9304 (CI:[0.9048, 0.9560]), p=0.0001
  • MobileNet 1.Full supervised: 0.9488 (CI:[0.9177, 0.9774]) 2.Ours (80%): 0.9490 (CI:[0.9267, 0.9707]), p<0.0001

Q9. Writing details. (R1,R3,R4) We will revise all notations, hyperparameter statements (m=4, w=0.5). We will add cosine similarity in Fig 1, but we will retain ‘255-D hypersphere’ as it is a technical term. (R1)

[Ref1] H. Zou, et al. “Multi-angle Consistent Generative NeRF with Additive Angular Margin Momentum Contrastive Learning.” CVPR 2024. [Ref2] H. Cevikalp, et al. “Reaching Nirvana Maximizing the Margin in Both Euclidean and Angular Spaces for Deep Neural Network Classification.” TNNLS 2025. [Ref3] J. Chen, et al. “Think twice before selection: Federated evidential active learning for medical image analysis with domain shifts.” CVPR 2024




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I recommend `accept’ considering all reviews.

    The authors addressed most of R1’s initial concerns. Some of R1’s new concerns in the rebuttal could be addressed in the camera-ready version. Regarding the highly imbalance issue raised by R1, I think 2:1 is not considered highly imbalanced in medical imaging.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top