Abstract

Deep learning (DL) methods have achieved great success in medical image segmentation, but they are challenged to demonstrate robust performance across different datasets due to domain and modality gaps. The Source-Free Domain Adaptation techniques adapt DL models to generalize across domains without access to source data, and active learning is implemented to actively query informative target samples to fine-tune models, thus improving their generalization. However, only a few Active Source-Free Domain Adaptation methods have been proposed. Additionally, existing methods focus on same-modality adaptation and lack mechanisms to address modality gaps, thus limiting their applicability. To address these limitations, we propose a novel Active Source-Free Cross-Domain and Cross-Modality Adaptation method for medical image segmentation. This method adapts models across different domains and modalities by employing a novel Active Test Time Sample Query strategy to jointly implement Image Sensitivity Query (ISQ) and Organ Heterogeneity Query (OHQ). ISQ is designed to evaluate samples’ image-level modality agnostic informativeness, thus querying informative samples from different domains and modalities. OHQ is proposed to query samples with large foreground diversity by measuring the uncertainty-weighted organ boundary discontinuity and uncertainty-weighted organ interior abnormality, thus avoiding the influence of modality-specific background noise. A Dynamic Image-to-Organ Scaling mechanism is proposed to dynamically fuse the results of ISQ and OHQ for sample querying. We evaluated our method on cross-domain and cross-modality volumetric pancreas segmentation tasks. Our method outperformed other state-of-the-art methods on adaptation from a CT domain to another larger CT domain, T1-weighted MR and T2-weighted MR domains.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3160_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{YanJin_Active_MICCAI2025,
        author = { Yang, Jin and Yu, Xiaobing and Qiu, Peijie and Marcus, Daniel and Sotiras, Aristeidis},
        title = { { Active Source-Free Cross-Domain and Cross-Modality Adaptation for Volumetric Medical Image Segmentation by Image Sensitivity and Organ Heterogeneity Sampling } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15965},
        month = {September},
        page = {2 -- 12}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposed Active Source‑Free framework that tackles cross‑domain adaptation in 3D medical image segmentation. It introduced a test‑time querying scheme that unifies two complementary criteria: Image Sensitivity Query (ISQ)―scores each unlabeled target scan by how strongly its predictions diverge under modality‑agnostic perturbations, thereby identifying the most informative images regardless of CT or MR contrast.

    Organ Heterogeneity Query (OHQ)―selects samples whose pancreas exhibits high, uncertainty‑weighted boundary discontinuity and interior abnormality, guaranteeing foreground diversity while suppressing modality‑specific background noise.

    Finanlly, a Dynamic Image‑to‑Organ Scaling mechanism then balances ISQ and OHQ across active‑learning rounds, letting the model emphasize image‑level cues early and organ‑level subtleties later. Without any source data, this strategy enables a pre‑trained network to be incrementally fine‑tuned on just a fraction of annotated target volumes, yet it outperforms state‑of‑the‑art active SFDA baselines when adapting from a CT source to a larger CT set as well as to T1‑ and T2‑weighted MR datasets, thus closing both domain and modality gaps in volumetric pancreas segmentation.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Active source‑free framework that bridges both domain and modality gaps: Previous Active SFDA work only handled same modality shifts, this paper is the first to demonstrate a single strategy that adapts a CT‑trained model to larger CT cohorts and to T1‑ and T2‑weighted MRI volumes without any source data.
    2. Novel dual‑criteria query formulation: Image Sensitivity Query (ISQ) scores each unlabeled scan by the KL divergence between its prediction and four modality‑agnostic perturbations, identifying images that expose modality‑invariant weaknesses. Organ Heterogeneity Query (OHQ) is to exploit foreground diversity, computing uncertainty‑weighted boundary discontinuity and interior abnormality so that queried samples enrich organ shape and texture instead of background clutter.
    3. many of evaluations across modalities and metrics: Results are reported for 3‑D U‑Net and Attention U‑Net, two domain shifts (NIH→MSD) and two cross‑modality shifts (CT→T1w, CT→T2w), using Dice and 95th‑percentile Hausdorff Distance. Using plenty of experiments to evaluate their proposed method’s performance.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Limited novelty relative to recent Source‑Free and Active‑Learning work: Image Sensitivity Query (ISQ) relies on prediction divergence under standard perturbations (Gaussian noise, blur, brightness/contrast) much like Tent’s entropy‑minimisation + augmentation [16] and Fourier‑style test‑time adaptation [19]; OHQ extends edge‑entropy ideas already explored for prostate adaptation [8].
    2. There is still high annotation budget required and no cost analysis: The method reaches near‑upper‑bound Dice only after labelling 20 – 25 % of target volumes while hundreds of 3‑D scans in large cohorts, but the paper does not quantify annotation hours or compare them to semi‑supervised or weak‑label alternatives. This weakens the argument that the approach is truly “data‑efficient.”. Also, they do not mention how to annotate the target samples when selected them at each round, just waiting for the annotators the finish then do the next round?
    3. Estimating high computational overhead: Query scoring requires multiple forward passes per volume (four perturbations for ISQ + uncertainty maps for OHQ) and repeated fine‑tuning rounds. Training took around 1000 + 5 × 500 epochs on an A100 GPU, but runtime or memory costs and practicality for typical labour cost in fine-tune are not reported.
    4. I am wondering why the author did not do comparisons with existing active source-free domain adaptation methods such as they mentioned [8,17,18]. If these methods perform worse in cross-modality tasks, it just proves their advantages in this scenario. Also, some source‑free cross‑modality frameworks like Fourier style mining [19] or feature‑augmentation SFDA for pancreas MR [21] are absent from the table, making it hard to distinguish the benefit of active querying from other adaptation strategies.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed motivation seems reasonable, but there were some similar works that existed. And also, I am afraid that if a case is mis-segmented totally, it will also get a good score in the designed matrix.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The responses answer my concerns.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a source-free domain adaptation framework for medical image segmentation that integrates an active learning mechanism to minimize annotation cost and maximize test-time adaptability. The method leverages two query strategies: Image Sensitivity Query (ISQ) based on perturbation-driven uncertainty, and Organ Heterogeneity Query (OHQ) based on uncertainty-weighted boundary and interior variation. A dynamic weighting strategy is introduced to balance ISQ and OHQ across rounds. The proposed method is evaluated on pancreas segmentation with multiple source-to-target settings, and demonstrates competitive results with limited annotations.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Perturbation-based informativeness estimation (ISQ): The proposed ISQ strategy introduces modality-agnostic image perturbations to quantify uncertainty, which is a task-specific, intuitive, and practical alternative to classical entropy-based approaches.

    Organ Heterogeneity Query (OHQ): This is a thoughtful design tailored for medical segmentation. By considering both boundary discontinuity and internal intensity variance with uncertainty weighting, it addresses the anatomical complexity that often leads to mis-segmentation.

    Dynamic weighting strategy between ISQ and OHQ: This adds adaptivity over AL rounds and reflects a practical intuition: focusing first on image-level transferability, then refining organ-level structures.

    The experiments are conducted on four publicly available datasets, with multiple adaptation settings (CT→CT and CT→MR) and against a range of classical AL baselines.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The authors assume that “if a sample is more informative to the network, its perturbations will be more likely to be mispredicted.” While this hypothesis forms the foundation of the proposed Image Sensitivity Query (ISQ), the manuscript lacks theoretical justification or empirical evidence to support it.

    • Why does higher informativeness necessarily imply prediction instability under modality-agnostic perturbations?
    • Are there prior works or supporting experiments that demonstrate this correlation? I suggest the authors include either a theoretical analysis (e.g., from uncertainty estimation or mutual information perspectives) or an ablation/visualization showing that high ISQ scores correlate with actual model improvement when selected.

    The paper emphasizes cross-modality adaptation, implying that the method is specifically designed to handle modality gaps (e.g., CT→MRI). However, all components (ISQ, OHQ, and the fusion strategy) are modality-agnostic and general-purpose. There is no explicit mechanism to address modality shifts, such as feature-level alignment, disentanglement, or modality translation. Therefore, I question whether the method truly addresses cross-modality adaptation, or merely evaluates in such settings without dedicated design.

    Table presentation issues and inaccurate highlighting. In Table 1, the 0% and 100% columns are not part of the active query process and should be separated from the core comparison. Additionally, in the r=4 (16%) column of the 95HD metric, the result for “Ours” is incorrectly bolded (7.15), despite being worse than both BADGE (6.48) and SENT (6.86). The correct best and second-best entries should be BADGE and SENT, respectively. These presentation errors may mislead readers and should be carefully revised.

    The paper compares against classical active learning baselines (e.g., LC, Core-set, BADGE), most of which were originally developed for natural image classification. It does not include recent or domain-specific methods tailored for medical image segmentation or cross-modality adaptation. This limits the strength of the empirical claim regarding state-of-the-art performance. I recommend including latest methods in the comparison.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The main factor affecting my overall score is the lack of comparison with recent state-of-the-art methods specifically designed for medical image segmentation or cross-modality adaptation. While the proposed method combines active learning and source-free domain adaptation in a novel way, it is primarily benchmarked against classical baselines that are not tailored for the task. This weakens the empirical strength of the claims. In addition, some key assumptions in the method, such as the correlation between perturbation sensitivity and sample informativeness, are not theoretically justified or empirically verified. Minor issues such as table formatting and incorrect result highlighting further impact the clarity of presentation. Overall, while the paper has potential, I believe it requires additional comparisons and analysis to meet the standard for a strong acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The remaining issues are relatively easy to address, and the current version is suitable for publication.



Review #3

  • Please describe the contribution of the paper

    The paper proposes an Active Source-Free Cross-Domain and Cross-Modality Adaptation method for medical image segmentation. The method employs an Active Test Time Sample Query strategy that combines Image Sensitivity Query (ISQ) and Organ Heterogeneity Query (OHQ) to adapt models across different domains and modalities. A Dynamic Image-to-Organ Scaling mechanism is introduced to dynamically fuse the results of ISQ and OHQ. Experiments on cross-domain and cross-modality volumetric pancreas segmentation tasks demonstrate superior performance compared to other state-of-the-art methods.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The method achieves superior performance on multiple datasets and outperforms several state-of-the-art active query methods.
    2. The Dynamic Image-to-Organ Scaling mechanism allows the model to adaptively focus on different levels of informativeness during the adaptation process, enhancing the model’s generalization.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    A more comprehensive review of existing source-free domain adaptation methods would provide better context for the contributions of this work, such as [1] and [2]. [1] A curriculum-style self-training approach for source-free semantic segmentation [2] Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer Framework

    1. The experiments only use U-Net and Attention U-Net as segmentation networks. The paper does not evaluate the performance of other popular backbones such as ResNet or Transformer-based methods.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is novelty and effective deomstrated by extensive experiments.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    All my concerns are well addressed.




Author Feedback

AL-Active Learning; U/ADA-Unsupervised/Active Domain Adaptation We thank reviewers for the feedback, and address all concerns below: -Comparisons with recent methods(R#1,3): Source-free ADA [8,17,18] query slices or patches for 2D mono-modal medical image segmentation, but we query volumes for 3D segmentation across modalities. We performed comparisons with [8,18], but not with [17] as it uses source data to create reference points. Our method achieved 65.07 DSC in CT-to-T1w with 5% queried samples for U-Net, outperforming [8] (63.31) and [18] (61.82). On CT-to-T2w, querying 5% samples yielded 58.41 DSC, compared to 51.32 by [8] and 49.60 by [18]. On NIH-to-MSD, our ADA queried 4% and 8% samples for U-Net to achieve 76.33 and 82.01 DSC, outperforming non-source-free UDA SOTA baselines: Pseudo Label (70.68), Discriminator (71.76), SIFA (66.05), and VAE (75.74; Yuan Yao. MIDL 2022). Our results demonstrate that active querying improves DA performance even when solely relying on target data. -ISQ justification(R#3): Epistemic uncertainty arises from a model’s lack of knowledge—specifically, limitations in learning from data and generalizing to new situations. When a model is less confident in its predictions for a sample, that sample is likely to lie away from the distribution of learned knowledge, indicating it contains more unlearned information. Such samples are thus more informative and valuable for model training. ISQ quantifies this uncertainty by measuring the difference between the model’s original and perturbed probabilistic predictions. Querying high ISQ samples targets those that contribute most to the model’s epistemic uncertainty and limit its generalization to the target domain. To show the benefit of selecting high ISQ samples, we queried the top 5% ISQ samples for training U-Net and achieved 63.57 DSC in CT-to-T1w, higher than 55.42 obtained by using the lowest 5% ISQ samples. Using the top 10% ISQ samples yielded 73.60 DSC, higher than 63.84 by the lowest 10% ISQ samples. -Novelty(R#1): Our work is the first to develop source-free ADA for 3D medical image segmentation across domains and modalities, while prior works focus on natural images or 2D mono-modal medical images. It is the first to evaluate image-level epistemic uncertainty and organ-level intensity diversity and integrate them using a dynamic mechanism. Different from [16,19], ISQ is designed for AL query. OHQ measures diversity in organ boundary and interior and uncertainty in organ regions instead of solely measuring image uncertainty [8]. -Annotation cost(R#1): In all practical AL processes, annotation involves correcting predicted masks on queried cases rather than manually contouring from scratch, so the goal is to design efficient ways to find as few cases as possible to achieve the best accuracy. Thus, we focus on improving querying rather than quantifying annotation hours. -Computations(R#1): Initializing models on source data for the first 1000 epochs is not part of AL and thus excluded from ADA computation. During ADA, we query a small precent of samples, leading to small additional training cost. Our query is also memory efficient as it computes and stores quantitative scores scan by scan, while others (Coreset and BADGE) generate embeddings of all scans and load them to memory. -Comprehensive review(R#2): We will add suggested references and comprehensively review prior art. -Other networks(R#2): ADA focuses on query strategies instead of network architectures. To ensure generalizability, we used typically adopted networks. -Clarification on cross-modality(R#3): Our ADA is designed to identify target samples that pose challenges to model generalization over modality gaps. Thus, it is model-driven and agnostic to specific modalities, yet effective in cross-modality. Employing explicit cross-modality mechanisms can enhance performance, and we plan to explore this in future work. -Table issues(R#3): we will fix table formatting issues and typos




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    The reviewers appreciate the novelty of the active source free domain adaptation approach and the use of dynamic organ to image scaling approach. However, the reviewers also brought up several important concerns including lack of justification for the assumption that informative samples are likely to be mispredicted, limited discussion and comparison to existing recent source free UDA methods. Please address all the reviewers’ concerns carefully in the rebuttal.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper introduces an active source-free domain adaptation (ADA) framework for 3D medical image segmentation across domains and modalities.

    While reviewers raised concerns about computational overhead, novelty relative to prior work, and lack of comparison with recent SFDA baselines, these were addressed convincingly in the rebuttal with new empirical justification and clearer positioning.

    The overall contribution is methodologically sound, practically motivated, and demonstrates strong empirical performance in challenging adaptation settings. I recommend acceptance.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top