Abstract

Medical imaging datasets often vary due to differences in acquisition protocols, patient demographics, and imaging devices. These variations in data distribution, known as domain shift, present a significant challenge in adapting imaging analysis models for practical healthcare applications. Most current domain adaptation (DA) approaches aim either to align the distributions between the source and target domains or to learn an invariant feature space that generalizes well across all domains. However, both strategies require access to a sufficient number of examples, though not necessarily annotated, from the test domain during training. This limitation hinders the widespread deployment of models in clinical settings, where target domain data may only be accessible in real time.

In this work, we introduce HyDA, a novel hypernetwork framework that leverages domain-specific characteristics rather than suppressing them, enabling dynamic adaptation at inference time. Specifically, HyDA learns implicit domain representations and uses them to adjust model parameters on-the-fly, allowing effective interpolation to unseen domains. We validate HyDA on two clinically relevant applications—MRI-based brain age prediction and chest X-ray pathology classification—demonstrating its ability to generalize across tasks and imaging modalities. Our code is available at: https://github.com/doronser/hyda

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0392_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/doronser/hyda

Link to the Dataset(s)

Brain MRI datasets: list too long to fit here. see table A4 in this manuscript for full reference of datasets and citations https://arxiv.org/pdf/2403.13319 NIH: https://nihcc.app.box.com/v/ChestXray-NIHCC VinDr: https://vindr.ai/datasets/cxr Chexpert: https://stanfordmlgroup.github.io/competitions/chexpert/

BibTex

@InProceedings{SerDor_HyDA_MICCAI2025,
        author = { Serebro, Doron and Riklin-Raviv, Tammy},
        title = { { HyDA: Hypernetworks for Test Time Domain Adaptation in Medical Imaging Analysis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15964},
        month = {September},
        page = {250 -- 260}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    Uses a hypernetwork paradigm to generate novel network weights for novel domains. I.E. a secondary network dynamically generate weights and biases for primary networks. A domain encoder encodes an image into a domain representation (trained by learning to predict the domain of the input image), which is then fed through a mapping function to translate the embedding to a set of parameters for the primary network (DenseNet121).

    Datasets: NIH, CheXpert, VinDr (5 classes which are found in those 3) Model: DenseNet121

    As in most domain adaptation/generalization works the improvement in often quite minor. In some domains there is a significant improvement, but less so in others.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Methodology is novel and theoretically well supported. Introducing hypernetworks for domain adaptation which was previously utilized for medical image analysis on tabular data (In medical domain) Introduced a multi similarity loss for domain classifier which helps to separate domain features more efficiently.

    Experiments are extensive and mostly well reported, and generally lead to the conclusions made in the paper, minus some missing statistical significance testing.

    • Have conducted a good ablation study demonstrating the effectiveness of different loss terms and different layer configurations for primary network head
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The experiment setup is not clear in some places. For example, they have not described clearly what is leave one out setting. (maybe this is common sense for computer science researchers but I still believe that explaining this at least once is useful – For example, When there are 3 datasets A, B and C, leave one out means, initially training the model on datasets B, C and adapting for A). This problem exists for both Chest X-ray and Brain MRI experiments.

    Even though other loss terms like cross entropy, weight decay are understandable, they could describe how multi similarity loss is calculated w.r.t medical images

    For the Chest X-Ray classification, the proposed method only marginally outperforms the baseline, while not necessarily insignificant some statistical evidence demonstrating that the difference in accuracy between the proposed method and the baseline is statistically significant and not random chance.

    Table 2 shows the standard deviation, but table 1 does not. Table 3 shows the standard deviation for the average performance, the difference between the baseline and HyDA is very close and significantly less than 1 standard deviation, further statistical evidence is required to demonstrate that the results are significant.

    Relatively limited analysis on the results: though this is limited by the page limitations. The purpose of figures 2 and 3 are somewhat underexplained, or at least it is unclear to me how they relate to the performance and theoretical backing of the model. It shows that the feature embeddings are accurately able to classify the domain of the input.

    Comparing to very few baseline methods, and the methods they compared to were pretty old.

    This is a tiny mistake: though they have mentioned 19 datasets for brain MRI, they have only plotted data for 18 datasets in Figure 3.

    This is another tiny weakness. They have not cited any work in their introduction. It is very rare to find such papers without any cited work in their introduction.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Applying a novel methodology based on hypernetworks and expanding results across multiple datasets and modalities motivated me to accept this paper.

    However, it is not a strong accept because there are some issues in clarity of experiments (described under weaknesses), and they could have described the methodology more clearly such as loss functions, and some of the results are statistically underpowered.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The author has addressed my main concern with the original paper: lack of statistical tests showing the significance of the results.



Review #2

  • Please describe the contribution of the paper

    The authors present a proof-of-concept demonstrating the application of hyper-networks for tasks related to medical image analysis. The central hypothesis proposed is that any target domain can be effectively represented through interpolation among source domains.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The experimental results clearly demonstrate performance improvements over the compared baselines, underscoring the effectiveness of the proposed method.

    2. The experiments support the authors’ hypothesis that interpolation between multiple source domains can effectively represent the target domain, at least in the context of the current tasks (classification and brain age prediction).

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The use of hyper-networks for test-time adaptation presents limited novelty. The proposed hyper-network method, which relies on conditioning based on domain features, closely resembles the approach presented previously in “Hypernetwork Design for Self-Supervised Domain Adaptation in Compressive Sensing MRI Reconstruction.”

    2. The baselines selected for comparison in the paper are relatively outdated (published in 2018 and 2021). It would significantly enhance the paper if the authors included comparisons against more recent state-of-the-art methods relevant to their chosen tasks.

    3. Several editorial errors are noticeable in the manuscript (e.g., the results section on Brain Age Prediction).

    4. The authors should explicitly state the values of all hyper-parameters used during training to facilitate reproducibility and clarity.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Have the authors explored the applicability of their proposed method to other reconstruction-oriented tasks, such as accelerated MRI reconstruction, medical image translation, or segmentation tasks? Such exploration would enhance the scope and robustness of their claims.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Experimental results supporting their hypothesis and application of hyper network in UDA for medical images.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors addressed the concerns highlighted in the review and highlighted their novelty.



Review #3

  • Please describe the contribution of the paper

    The paper introduces HyDA which is a framework leveraging hypernetworks to achieve domain adaptation. HyDA is composed of 3 components. First, the primary network performing the task of interest. Then, the domain classifier which predicts the image domain from the image. The domain classifier can be split into two sub-networks, an encoder outputting a domain embedding vector from the image, and a classification head predicting the domain from the embedding vector. Finally, the hypernetwork predicts a subset of the weights for the primary network from the domain embedding vector. Hence, the hypernetwork adjusts the behavior of the primary network based on the image domain.

    The classifier is pre-trained with a combination of the CE loss, multi-similarity loss and a weight decay regularization. The “inner” part of the primary network (i.e. the part that is not conditioned on the domain) and the hypernetwork are jointly trained to minimize the task loss, a weight decay regularization on both the hypernetwork and the inner primary network parameters and a L2 regularization on the weights/biases predicted by the hypernetwork.

    The model is validated on two tasks: chest X-ray classification and MRI brain age prediction. On the chest X-ray classification task, HyDA is compared to a baseline (no DA), an unsupervised DA method and a test-time DA method. HyDA outperform all other methods. On the MRI brain age regression task, HyDA is only compared to the baseline (no DA) and outperforms it. Finally, t-SNE plots illustrate that the domain classifier encoder predicts reasonable embeddings on unseen domains which is why the hypernetwork is able to appropriately adjust the weights of the primary network on unseen domain.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written and easy to follow. The idea of using hypernetworks to condition the primary network on the data domain is very relevant and sound for medical imaging. The authors provide sufficient implementation details for a reasonable attempt to reproduce their results (the code should also be released on acceptance). The experiment section is relatively convincing.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    In my opinion, the main weakness of the paper lies in the experiment section. Indeed, I am convinced that HyDA is a good idea. Yet, the choice of experiments does not manage to be fully convincing.

    The domain gap considered are relatively small (c.f. the overall good performance of the noDA baseline). HyDA is effectively a larger network than the other baselines as the whole domain classifier encoder and hypernetwork contribute to processing for every single image. Unlike in other hypernetworks settings like in “HyperSpace: Hypernetworks for spacing-adaptive image segmentation”, S. Joutard et al. 2024 where the conditioning variable can be fixed for a series of images, in HyDA, the conditioning path is always computed for every new image, contributing to the processing power of the overall model. Hence, given the small gap of performances between the baseline(s) and HyDA and this difference in the computing power of the models, I think the demonstration of HyDA’s value could be improved. Why not trying segmentation? Or larger domain gaps?

    As a follow up of the experiment section, I think the paper would benefit from a bit more comments on the limitations of HyDA. I would personally be interested in knowing the authors’ intuition regarding if is it suitable to larger domain gap adaptation. Also, table 4 suggests that the conditioning needs to be carefully incorporated in the primary network. This is definitely something that future users of HyDA should be warned/guided on.

    Why is there only one baseline considered for brain MRI age regression?

    The general presentation of the field and the choice of relatively old baselines is questionable.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea presented is definitely interesting and original. The experiment section is not fully convincing because of the couple of points raised in the weakness section of the paper.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors clarified most of the points raised in my review.




Author Feedback

We thank the reviewers and AC for constructive comments. Below, we address all main concerns in a point-by-point manner. Upon acceptance, we will incorporate clarifications and citations, revise editorial errors and typos, provide supporting statistics and release code. 1) Novelty (R2): HyDA introduces a novel approach that leverages domain-aware features for on-the-fly domain adaptation in medical imaging, without being tied to a specific task or architecture. We appreciate the reference paper and we will cite it. However, the paper was published three weeks after the MICCAI submission deadline and was unavailable to us. Moreover, their method is limited to a single task (MRI reconstruction) and depends on auxiliary inputs that may not be available at test time. In contrast, HyDA is applicable across modalities, architectures, and tasks, relying only on the input image—making it more practical for real-world use. 2) Comparison to Baseline and other DA methods (R1, R2, R3): We agree that comparisons should include recent and diverse methods (R1, R2). As most existing approaches do not support multiple source domains or tasks, we selected the most relevant and up-to-date methods with available code. While additional baselines for the brain age experiment (R3), could certainly add context, our primary aim was to illustrate HyDA’s ability to generalize across domains in a regression setting, rather than to benchmark the best-performing regressor. 3) Statistical Significance (R1): We will revise the tables to include standard deviations and p-values. Paired t-tests comparing HyDA to each baseline showed statistically significant improvements. In the CXR experiment, p-values were: HyDA vs. Baseline = 0.0023, MDAN = 0.0010, and TENT = 0.00007—all below 0.0025. 4) Missing Explanations (R1): Thank you for pointing this out. We will clarify in the revision. In the leave-one-out setting, given N datasets, training is done on N−1 and testing on the held-out one, cycling through all domains. The multi-similarity loss optimizes against hard positives and negatives. As domain labels are known during training, such samples can be selected directly. 5) Application to Other Tasks (R2): HyDA is modular and applicable across tasks—including reconstruction and segmentation. We focused on classification and regression as they represent distinct prediction types (discrete and continuous) and are widely used in medical imaging. This highlights HyDA’s versatility. Once the code is available, it can be adapted to other tasks by replacing the primary network and task-specific loss. 6) t-SNE Plots (R1): These plots provide qualitative insight into our key assumption: target domain features can be interpolated from source domain features. While t-SNE offers only a 2D projection, the observed structure suggests that the hypernetwork—conditioned on these embeddings—can produce domain-aware weights and biases for the primary network, enabling on-the-fly generalization to unseen target domains. 7) Large Domain Gaps (R3): HyDA is expected to perform well when the target domain lies within the span of source domain embeddings (interpolation). When the gap is too large (extrapolation), performance may degrade. To improve embedding quality, we incorporate a multi-similarity loss into the domain classifier to promote better separation of training domains, improving the chance of effective interpolation. For extreme shifts, adding diverse source domains or adapting the embedding mechanism may further help. 8) Hypernetwork Size (R3): HyDA adds parameters compared to the baseline, but the increase is minimal. In the CXR experiment, it adds only 0.2M parameters—just 2.86% more than the 7M in the baseline DenseNet121. This will be clarified in the revision. 9) Hyperparameters (R2): Main hyperparameters are listed in the implementation details. Full configuration files will be included in the code repository upon acceptance.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    This paper received 3 weak acceptance. However, the comments from the three reviewers are very severe from my point of view. I think its current status cannot meet the acceptance boundary compared to other papers. Major concerns from the 3 reviewers: Unclear experimental setup Limited comparison only with one domain adaptation and one test time domain adaptation method and both are quite old papers. The performance improvement with the old papers are marginal. No citation in the introduction. Limited novelty of the proposed method.

    Introduction does not state the motivation clearly and state the challenges of the existing test time domain adaptation. It is not clear DA or TTA is focused in this paper. This means the writing got serious problem.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents a clear, well-motivated framework with good generalization across tasks. The rebuttal addressed key concerns.



back to top