Abstract

Source-Free Unsupervised Domain Adaptation (SFUDA) has recently become a focus in the medical image domain adaptation, as it only utilizes the source model and does not require annotated target data. However, current SFUDA approaches cannot tackle the complex segmentation task across different MRI sequences, such as the vestibular schwannoma segmentation. To address this problem, we proposed Reliable Source Approximation (RSA), which can generate source-like and structure-preserved images from the target domain for updating model parameters and adapting domain shifts. Specifically, RSA deploys a conditional diffusion model to generate multiple source-like images under the guidance of varying edges of one target image. An uncertainty estimation module is then introduced to predict and refine reliable pseudo labels of generated images, and the prediction consistency is developed to select the most reliable generations. Subsequently, all reliable generated images and their pseudo labels are utilized to update the model. Our RSA is validated on vestibular schwannoma segmentation across multi-modality MRI. The experimental results demonstrate that RSA consistently improves domain adaptation performance over other state-of-the-art SFUDA methods. \textbf{We will release all codes for reproduction after acceptance.}



Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2349_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/zenghy96/Reliable-Source-Approximation

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zen_Reliable_MICCAI2024,
        author = { Zeng, Hongye and Zou, Ke and Chen, Zhihao and Zheng, Rui and Fu, Huazhu},
        title = { { Reliable Source Approximation: Source-Free Unsupervised Domain Adaptation for Vestibular Schwannoma MRI Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15010},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper solves the problem of source-free domain adaptation for vestibular schwannoma MRI segmentation, where only source segmentation model is provided for target adaptation. A reliable source approximation method is designed to generate source-like and structure-preserved images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is technically sound. Applying diffusion model for source-like generation is novel and reasonable for target adaptation.
    2. Comparison methods include various strategies, which makes the experiments more convincing.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The source training requires an additional diffusion model, which is contrary to the traditional SFDA setting and is not fair to compare with the traditional SFDA method.
    2. The source diffusion model can generate source-like images, but these images may also contain source privacy information, which may be a hidden danger of this method.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    With the development of diffusion models, it is inevitable to apply them to SFDA, but the security issues that may be caused by the generated models also need to be considered. Is it possible to design some diffusion models that protect security to generate source-like images?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is novel but has safety risks.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a source-free unsupervised domain adaptation (SFUDA) framework, in which case the source data is inaccessible during adaptation, and the model is only fine-tuned on target data. The framework (Reliable Source Approximation, RSA) includes a conditional diffusion model to achieve more realistic source-like image generation conditioned on edge maps by Canny filters. Under different thresholds, the best generated image and pseudo label are refined and selected via uncertainty measures and consistency, respectively. The framework is evaluated on vestibular schwannoma MRI segmentation, a challenging task with significant class imbalance. Compared with previous methods employing different strategies (i.e., entropy-based, pseudo-labeling, and source approximation), RSA achieves the best performance. RSA also exhibits insensitivity to several hyperparameters to some extend.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. High-level organization and presentation are great.
    2. Applying conditional diffusion model to achieve source approximation is a good idea.
    3. Combining multiple measures to derive reliable approximation and pseudo-labels has good technical soundness.
    4. Sensitivity analyses of multiple hyperparameters are thorough, and the Dice reveals relatively good resilience to different hyperparameters.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. While I enjoy reading this paper, some details need to be elaborated.
    2. I am a little confused by some settings and results, such as the batch-based vs. centralized fine-tuning.
    3. There is only one pseudo-labeling method included. Most of the comparing methods are still entropy-based.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    I divide this section into two parts. The first part is the confusion/questions the reviewer currently has and the suggestions the reviewer hopes the authors could take in the final version. Answers/revisions to these topics may have a positive impact on the reviewers’ final recommendation (if applicable). The second part is some comments/suggestions for future improvement, in which the reviewer does NOT expect any reaction on those items as some may need extra experiments / more space, which is impractical for a conference submission.

    Part I:

    1. In Section 2.3, the last sentence of Prediction consistency. I believe there is a typo. It should be “The final approximation is found based on the smallest …”.

    2. How is the initial pseudo-label (before refinement) derived? Is uncertainty map u playing a role in there, or is it just the vanilla pseudo-label like argmax(softmax(f(x)))?

    3. Eq.9 is a bit confusing. I can roughly get the idea of the refinement, but what does “at the edges” mean? Will all pixels with high uncertainty (>T_un) be added to the pseudo-label if connected to the initial pseudo-label? Please explain.

    4. What is the purpose of comparing batch-based vs. centralized fine-tuning? The paper says that for the batch-based fine-tuning, “it utilizes the current batch to train the model and directly predicts the results for the current batch”. How would that be implemented? Based on Section 3.1 Dataset split, I would expect to train the model on 1493 target training images and then evaluate it on 423 target testing images, which contradicts the batch-based fine-tuning.

    5. Why do all other comparing methods, except for FSM, exhibit very similar numerical results? And why do batch-based and centralized fine-tuning have no influence on other methods? Could the authors provide some explanations? Providing some visualization of other model’s prediction results would be extra helpful. While Fig.2 is very impressive, only showing the successful case of the proposed methods is insufficient.

    6. In Table 2, 3, and 4, what are exactly the Approximation Quality (%) and Quantity (n)? I can’t find their explicit definition.

    7. In parameter sensitivity analysis, when varying one hyperparameter, what is the default value of other parameters? Why the best performance of the three experiments do not match each other and do not match the best performance in Table 1 (i.e., 77.83 Dice)?

    8. Would be good to include a limitation statement. With diffusion model generation, pseudo-label refinement, and filtering, it could be imagined that the training time would be much longer.

    Part II:

    For the further improvement, it would be good to:

    1. Include more pseudo-labeling methods for comparison, such as [1,2];
    2. Explore more applications beyond vestibular schwannoma segmentation.

    Ref: [1]: Karani, Neerav, et al. “Test-time adaptable neural networks for robust medical image segmentation.” Medical Image Analysis 68 (2021): 101907. [2]: Chen, Cheng, et al. “Source-free domain adaptive fundus image segmentation with denoised pseudo-labeling.” Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24. Springer International Publishing, 2021.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While I’d be happy to acknowledge the soundness of the proposed work, I still have many questions awaiting the authors’ explanations. I am overall positive about this paper and would like to recommend a weak accept.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors present a novel method for performing source-free unsupervised domain adaptation called Reliable Source Approximation (RSA), combining the following components: 1) an edge-guided conditional diffusion model, to generate source images from target image edge maps 2) an uncertainty segmentation model to guide pseudo-label selection 3) a mechanism for screening pseudo-labels generated from the uncertainty segmentation model on generated source images to preserve only high-confidence pseudolabels to fine-tune the segmentation model

    The framework enables adaptation of the segmentation network to a target domain under distribution shift without use of the source domain in the adaptation process.

    The authors apply their framework to the task of vestibular schwannoma segmentation from a source domain ceT1 and a target domain hrT2 MRI sequences. Quantitative results for this task demonstrate significant improvements of the proposed method over existing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors propose a number of components combined in a novel way to provide an effective approach for source-free unsupervised domain adaptation. Each component plays an important role in the effective application of the framework.

    Firstly, the edge-guided diffusion model for image generation provides a nice mechanism for domain adaptation, under the assumption that a Canny edge detector will be able to pick up similar edge maps in the source and target domains. The edge-guided diffusion model is trained only on the source domain and applied to the edge maps from the target domain for generation of source domain images.

    Separately, an uncertainty segmentation model is trained on the source domain. This segmentation model is applied to the generated source images, where the segmentation uncertainty is computed across (i) multiple Canny edge thresholds for (ii) multiple generated source images per target image. The authors introduce a method to compute prediction consistency around the segmented structure of interest, which can be leveraged to filter only high-consistency pseudolabels.

    Finally, these pseudolabels and their corresponding images can be used to fine-tune the segmentation model to make predictions on the target domain.

    The combination of these components in this framework yields an effective framework for SFUDA, as demonstrated by quantitative comparison in Table 1 which show superior performance to existing methods, including entropy minimization methods such as TENT and SAR, and source approximation method FSM.

    Tables 2 and 3 also illustrate the importance of selecting appropriate thresholds for the prediction consistency between predictions of generated images, and for the uncertainty if pixels in the prediction maps. The explanation for the intuition behind the final choices is helpful too.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the authors demonstrate this method performs well on a single domain adaptation task, it is unclear how well the combined components of the framework, including how the use of edge maps, and selecting appropriate thresholds for T_un and T_r, would translate to other domain adaptation tasks.

    Additionally, it seems the appropriate selection of the thresholds T_un, T_r, as well as the selection of number of Canny filter thresholds to use, would be dependent on the domain adaptation task and require a set of labeled images in the target domain to select these parameters appropriately.

    For wider applicability it would have been helpful to see at least one other domain adaptation task compared here. Otherwise, the methods are novel and results very encouraging.

    Additionally, it would have been helpful to have a visual comparison of the proposed method versus baseline methods. In Fig. 2, only the proposed method is illustrated, so it is hard to tell qualitatively how different the pseudolabels and generated source images look compared to FSM, for example.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The main feedback I have is that while the methods are novel and the results are great for this specific domain adaptation task, demonstrating that this method could apply more widely to other domain adaptation tasks would have been helpful, and desirable for a possible follow up journal paper.

    It would be helpful for the community if the authors could comment on the particular domain adaptation tasks for which they anticipate the proposed framework to lead to significant improvements over the state-of-the-art, and to give their reasons for this.

    Minor comments:

    • Tables 2 and 3 are show how different thresholds were selected - I’m not sure these should be titled “ablation study”, but perhaps “parameter search”
    • Some minor grammatical errors in the Prediction consistency section:
      • “…best edge index is find by…” -> “found by”
      • “the final approximation is find based the smallest…” -> “the final approximation is found based on the smallest”
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors have proposed a novel framework combining several key novelties that in tandem enable a powerful source-free unsupervised domain adaptation for segmentation of vestibular schwannoma. The paper is well organized, and quantitative and qualitative results are shown demonstrating a significant improvement in segmentation performance of the proposed method over existing methods for a segmentation task in a target domain.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

  1. We appreciate the identification of some writing and grammar errors by the reviewers and have addressed them accordingly.

  2. The images we generate adopt the style from the source domain, while their spatial structure is provided by the edges of target data. Therefore, we believe this approach does not pose a risk of leaking private information from the source domain.

  3. “Batch-based fine-tuning” means that all test data are inputted in batches to update the model and simultaneously output final predictions. Each batch can only be used once. “Centralized fine-tuning” means that all data is used to update the model over many epochs before making the final prediction.

  4. There are significant differences between MR images from different sequences, which limit the effectiveness of entropy minimization and pseudo-label methods. We found that a significant performance improvement can only be achieved through source approximation.




Meta-Review

Meta-review not available, early accepted paper.



back to top