Abstract

Unsupervised Domain Adaptation (UDA) aims to align labeled source distribution and unlabeled target distribution by mining domain-agnostic feature representation. However, adapting the source-trained model for new target domains after the model is deployed to users poses a significant challenge. To address this, we propose a generative latent search paradigm to reconstruct the closest clone of every target image from the source latent space. This involves utilizing a test-time adaptation (TTA) strategy, wherein a latent optimization step finds the closest clone of each target image from the source representation space using variational sampling of source latent distribution. Thus, our method facilitates domain adaptation without requiring target-domain supervision during training. Moreover, we demonstrate that our approach can be further fine-tuned using a few labeled target data without the need for unlabeled target data, by leveraging global and local label guidance from available target annotations to enhance the downstream segmentation task. We empirically validate the efficacy of our proposed method, surpassing existing UDA, TTA, and SSDA methods in two domain adaptive image segmentation tasks. Code is available at \href{https://github.com/hritam-98/Quest4Clone}{GitHub}

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0297_paper.pdf

SharedIt Link: https://rdcu.be/dZxei

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72111-3_52

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0297_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Bas_Quest_MICCAI2024,
        author = { Basak, Hritam and Yin, Zhaozheng},
        title = { { Quest for Clone: Test-time Domain Adaptation for Medical Image Segmentation by Searching the Closest Clone in Latent Space } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {555 -- 566}
}

Reviews

Review #1

Please describe the contribution of the paper

The paper presents a test-time domain adaptation (TTA) method for medical image segmentation that does not require target domain data during training. The method involves creating a ‘source-like clone’ from the source latent space for each target image, allowing the model to adapt to new domains without retraining. Additionally, the proposed method extends to semi-supervised settings by integrating label guidance at both global and local scales. The method has been tested and shown to outperform existing UDA, TTA, and SSDA methods on two domain-adaptive image segmentation tasks.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) This paper proposes generating a ‘source-like clone’ from the source latent space for each target image, which circumvents the conventional reliance on extensive target domain data. 2) The proposed method is flexible enough to adapt to semi-supervised settings, further improving the precision and utility of the model in real-world applications.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. As mentioned in the paper, TTA for DA has been explored in previous research, which suggests that this should not be highlighted as a major novelty of this paper. The authors need to clearly outline how the proposed method differentiates from existing TTA approaches specifically designed for DA, especially regarding methodological advancements.
2. This paper argues that previous TTA methods require high-quality pre-trained models and precise data alignment, which do not effectively handle label and correlation shifts across domains. However, this paper does not provide clear evidence that the proposed methodology can address these critical shifts. Relative experimental validations, theoretical explanations, and comparisons to existing methods should be given.
3. While the paper introduces an interesting strategy of generating a ‘source-like clone’ from the source latent space for each target image, it lacks a comprehensive explanation of why this approach is particularly well-suited for the application of medical image segmentation. Medical imaging often involves complex and varied patterns that might not be fully captured by simply cloning target domain features. A more thorough discussion on the efficacy of this strategy in capturing diverse pathological features across different medical conditions and imaging modalities would significantly strengthen the argument.
4. This paper presents a theoretical proof to support the existence of a ‘source-like closest clone’ within the source latent space. However, this proof seems to overlook critical factors that affect its validity, specifically the influence of the radius scale in the probability calculation. In equation 6, the proof suggests that the probability of finding a clone within a specified radius can approach one as the number of samples increases. However, if the radius is too small, the probability of a no-empty sphere in equation 3 can be zero, indicating that no matter how many samples exist in the source domain, it may not find a sufficiently close clone. Conversely, if the radius is set too large, the condition becomes trivially satisfied by nearly any point within the source domain, thus diminishing the meaningfulness of the proof.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

To further strengthen the submission, a more detailed analysis that clearly delineates how your method improves upon existing TTA approaches would be beneficial. Specifically, it would be valuable to include a detailed discussion on how your model addresses label and correlation shifts across various domains to demonstrate its robustness and applicability. Additionally, the theoretical basis concerning the existence of the closest clone within the source latent space requires deeper examination, particularly regarding the radius p used in the probability calculations. If p is too small, finding a sufficiently close clone may not be feasible, potentially undermining your method’s reliability. A more rigorous theoretical discussion or methodological adjustments to consider the impact of radius size would enhance your approach’s scientific rigor. Furthermore, although generating a ‘source-like clone’ is relatively innovative, the paper lacks a comprehensive explanation of why this strategy is especially suited for medical image segmentation. Medical imaging involves complex and varied patterns that may not be fully captured by simply cloning features from the source domain. Expanding on the efficacy of this strategy in capturing diverse pathological features across different medical conditions and imaging modalities would significantly strengthen your argument and justify the method’s suitability for medical imaging tasks.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Reject — could be rejected, dependent on rebuttal (3)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The “Weak Reject” recommendation stems from several critical areas in the manuscript that require further development. First, the paper lacks a detailed analysis demonstrating how the proposed TTA method advances beyond existing approaches, particularly in handling label and correlation shifts in domain adaptation. Second, the theoretical foundation supporting the existence of the closest clone within the source latent space is insufficiently rigorous, especially concerning the impact of the radius p in probability calculations. This oversight calls into question the feasibility and reliability of the proposed method. Additionally, the manuscript does not convincingly justify why generating a ‘source-like clone’ is particularly suitable for medical image segmentation, a field known for its complex and varied patterns. These issues suggest that the paper is under the expected standards of acceptance.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The paper introduces an innovative test-time domain adaptation method for medical image segmentation by employing a latent space search to find the closest source-like clone for each target image. This approach eliminates the need for labeled target data during model training, leveraging unsupervised learning and variational autoencoder (VAE) techniques to adapt the segmentation model to new domains. The paper claims improved performance over existing domain adaptation methods, particularly in settings with limited labeled data, and demonstrates the effectiveness of their approach through extensive experiments on challenging medical imaging datasets. Additionally, it enhances the segmentation model using semi-supervised fine-tuning with both global and local label guidance to achieve high accuracy.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper presents a unique formulation for domain adaptation by using latent space optimization to find a source-like clone of target domain images during inference. This is novel because it circumvents the need for extensive labeled data in the target domain, which is a significant bottleneck in medical image analysis.
2. The paper innovates by introducing semi-supervised fine-tuning that utilizes a small amount of labeled data from the target domain to enhance the model, providing a realistic and practical approach for many real-world scenarios where limited labeled data is available.
3. The evaluation is rigorous, involving widely recognized and challenging datasets. The use of standard metrics like Dice Coefficient, Hausdorff Distance, and Average Surface Distance, and the comparison with state-of-the-art methods, demonstrate the robustness and superiority of the proposed method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The approach assumes that a closest clone in the source domain’s latent space can effectively represent target domain images, which might not hold in cases where the source and target domains have substantial differences. The paper would benefit from a more detailed discussion on the limitations of this assumption.
2. The paper utilizes a VAE to reconstruct images, that is, source-like target images. However, images transformed via the VAE still exhibit significant stylistic differences from the real source domain images, as can be seen in the supplementary Figure 2 provided by the authors. Given that these stylistic differences are substantial and not negligible, the authors need to further explain how an independently pre-trained source domain segmentation model can still perform well when there is a significant discrepancy in style between the images and the source domain.
3. The VAE training process in this method is independent of the pre-trained segmentation model, necessitating the VAE to still be trained on source domain data. Compared to existing TTA methods that rely solely on a pre-trained segmentation model, this approach still requires access to source domain images to some extent, thus exhibiting higher demands and computational complexity.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
1. To address the challenge of representing target domain images with a closest clone from the source domain’s latent space, the authors could explore the integration of domain-invariant feature extraction techniques. Techniques such as adversarial training or disentanglement strategies can help in learning representations that are more robust to domain shifts. Furthermore, incorporating domain-adversarial neural networks could allow the model to focus on features that are invariant between the source and target domains, potentially minimizing the impact of substantial differences between domains.
2. The authors need to further clarify the extent of the discrepancy in data distribution between the source-like images generated by the VAE and the actual source domain images, particularly the style difference between the pseudo-source and true source images.
3. Furthermore, the authors need to elucidate why the pre-trained source model is still able to infer segmentation results well despite these differences.
4. Independently training a VAE, as compared to the latest TTA methods, introduces an additional step in the training process, and the requirement for the VAE to train on source domain data limits the method’s applicability. The authors could explore the feasibility of conducting TTA solely on the source model.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Based on the method and generated images provided by the authors, there are noticeable differences between the reconstructed pseudo-source domain images and the actual pseudo-source domain images. Furthermore, the training of the VAE is independent of the source segmentation model. It is crucial to clarify why a source segmentation model trained independently can effectively handle images that differ from the source domain images. This point is essential and requires further elucidation.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Weak Reject — could be rejected, dependent on rebuttal (3)
[Post rebuttal] Please justify your decision

I don’t think the translated images are stylistically close enough to the source domain images. The quality of the generated images determines the final segmentation results. However, the method does not produce good image generation results, especially in terms of image style.

Review #3

Please describe the contribution of the paper

Authors propose a DA approach that leverages “clone” images from source domains to improve the segmentation performance of samples from a target domain in UDA/SSDA settings. This is applied to volumetric radiological image segmentation (CT-scans and MRI images). Experiments are conducted on public datasets, with results consistently surpassing multiple DA baselines in multiple imaging modalities.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is overall very well written.

The methodology is sound with some novelties that differentiate it from the traditional translation-based UDA/SSDA literature.

The ablation analysis shows that the multiple modules of the proposed pipeline add crucial incremental gains that culminate in the good performance of the proposed method.

Results are promising and show consistent gains with relation to multiple strong baselines in both the UDA and SSDA settings.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

While the objective results are compelling, the manuscript should make space in the main text for a small qualitative analysis with real images, masks and segmentation predictions for the proposed method and baselines. Authors might have to considerably shrink sections 1 and 2 for that. Another idea for freeing space may include moving the ablation to the supplementary materials instead of the qualitative analysis.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

Additionally to the code being released, the authors could also publicize some pretrained models with a demo on google colab, for instance. This is especially important due to the claims of domain generalization. With pretrained models and an online demo, anyone could verify generalization capabilities in their own data.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
##########################

General Comments

##########################

The reviewer urges that authors simplify the mathematical notation in the text. It does seem unnecessarily complex at some points.

##########################

Specific Comments

##########################
- Authors should make an effort to simplify Figure 2. It seems unnecessarily complex, while the flow of information throughout the modules could be more clearly shown.
- Section 3 really lacks some visual segmentation results in the target datasets.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The reviewer recommends the acceptance of this manuscript based on the degree of novelty, good writing and very promising results in relation to strong UDA/SSDA baselines.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Author Feedback

We thank the reviewers(R) for their comments(C). They find our work novel, sound, & interesting(R1,R3,R4), clear & informative(R3,R4), evaluated extensively(R3,R4), outperforms SoTA(R1,R3,R4). We address their comments:

R1C1:Difference with other TTA: Source-driven TTA uses spatial alignment or feature-level consistency, ineffective for objects with varied shape/position(e.g. tumor) or large distribution shift. They fail to maintain structured predictions, as they lack explicit constraint on target shape representation. We address this with a novel latent optimization strategy to find a source-like clone for each target image, eliminating need for target feature mining. This ensures good performance across varied shapes & domains(Tab1,2 of paper & Fig1,2 supp)

R1C2:Correlation & label shift Spurious correlation is unexpected association between irrelevant input features & predictions, shifting with domain change & causing poor performance on target [Geirhos et al.,’20]. We avoid this by not using target features at all. Starting from a random z in source latent space, we limit latent optimization to this space, computing SSIM loss between reconstructed source-clone & actual target image(Fig1B). This confines our workflow to source domain, avoiding correlation shift. Label shift is planned for future work

R1C3:Suitable for medical domain (A)We mitigate domain shifts without target supervision, crucial when annotated medical data acquisition is difficult.(B)Variational sampling & latent optimization capture intricate patterns, reconstructing a proxy image in source latent space, preserving critical anatomical & pathological features(Fig2 supp).(C)Our approach outperforms existing DA methods on benchmark datasets(Tab1,2), showing efficacy in capturing diverse pathological features across modalities

R1C4:Influence of radius p We clarify the confusion:(A)While there’s a tradeoff between sphere radius & point density, any sphere can theoretically contain infinite points.(B)Even if radius is small, probability of a non-empty sphere being zero is negligible because infinite sampling ensures a closest clone exists. Conversely, a large radius trivially satisfies the condition, although increasing the search space but not diminishing the proof’s validity.(C)To address the challenge of large search space, we propose using SSIM loss as distance metric. Minimizing L^SS gradually reduces search space, making the optimization more efficient

R2:Qualitative result; simplify text; demo We’ll follow the suggestions in the revision. Demo, code will be released after acceptance

R3C1:Source & target distribution We experiment on challenging DA benchmark datasets with multiple modalities(MRI & CT in MMWHS;{T1,T2,T1CE,FLAIR} in BraTS) with large distribution variations. Our method still produces SoTA performance. Though currently tested on modality shifts only, we will extend this paradigm across datasets with different organs & anatomical variation

R3C2:Visual difference in reconstructed image We clarify the confusion: Each column in Fig2 of supp shows original target image in 1st row, translated into its source-like clone in last row. This translated image retains target’s structural attributes but adopts source domain style. E.g. 1st column of Fig2, CT(src)->MRI(trg) setting shows a target MRI image(1st row) translated to CT-like clone(last row) using latent optimization. This CT-like clone resembles style of CT images(e.g.1st row of col.2), where source model was trained, making segmentation using source model feasible

R3C3:Training VAE Training VAE(takes 1hr) is necessary to learn source features used to translate target image to its source-clone. This step, though additional, is justified by our 5% DSC gain in Tab1,2 over TTA baselines. Inference is realtime & doesn’t need VAE(Fig1D)

R3C4:Adversarial & disentanglement learning These require target data during training, which is infeasible in TTA that don’t use target data for training

Meta-Review

Meta-review #1

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

This paper proposed a novel setting to remove the need of source domain data, which just use the pre-trained source domain model in a source-free protocol. The problem it self is meaningful and the results showing it is effective. Revewers also raised concerns of the effectiveness of the methodology is not well justified. It is not clear why it can outperform source data available methods. The closest clone within the source latent space is much less informative than the original data. In addition, the reviewers raised several concerns of its generated image, which dose not sufficiently support the efficiency.

The authors are encouraged to explain the details, which are worth discussing in MICCAI.
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

This paper proposed a novel setting to remove the need of source domain data, which just use the pre-trained source domain model in a source-free protocol. The problem it self is meaningful and the results showing it is effective. Revewers also raised concerns of the effectiveness of the methodology is not well justified. It is not clear why it can outperform source data available methods. The closest clone within the source latent space is much less informative than the original data. In addition, the reviewers raised several concerns of its generated image, which dose not sufficiently support the efficiency.

The authors are encouraged to explain the details, which are worth discussing in MICCAI.

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

This paper recieved mixed outlook.

There was a concern from R3 regarding the image quality of synthetic images. The authors clarified this in their rebuttal. I checked the images in Fig. 2. They actually look not bad (it preserves edges and their style moves towards the source domain). It is still an open question how close the reconstructed images should be to the source domain in UDA. Some approaches indeed rely on image-to-image translation.

It’s generally an interesting solution for TTDA without many flaws (based on that I can’t entirely agree with R3 on his/her point of lowering the scores while some of the concerns from R1 were clarified)
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

This paper recieved mixed outlook.

There was a concern from R3 regarding the image quality of synthetic images. The authors clarified this in their rebuttal. I checked the images in Fig. 2. They actually look not bad (it preserves edges and their style moves towards the source domain). It is still an open question how close the reconstructed images should be to the source domain in UDA. Some approaches indeed rely on image-to-image translation.

It’s generally an interesting solution for TTDA without many flaws (based on that I can’t entirely agree with R3 on his/her point of lowering the scores while some of the concerns from R1 were clarified)

back to top

Quest for Clone: Test-time Domain Adaptation for Medical Image Segmentation by Searching the Closest Clone in Latent Space

Author(s):