List of Papers Browse by Subject Areas Author List
Abstract
The purpose of this study is to improve Unsupervised Domain Adaptation (UDA) by utilizing intermediate image distributions from the source domain to the target-like domain during the image generation process. However, image generators like Generative Adversarial Networks (GANs) can be regarded as black boxes due to their complex internal workings, and we can only access the final generated image. This limitation makes it unable for UDA to use the available knowledge of the intermediate distribution produced in the generation process when executing domain alignment. To address this problem, we propose a novel UDA framework that utilizes diffusion models to capture and transfer an amount of inter-domain knowledge, thereby mitigating the domain shift problem. A coupled structure-preserved diffusion model is designed to synthesize intermediate images in multiple steps, making the intermediate image distributions accessible. A stochastic step alignment strategy is further developed to align feature distributions, resulting in improved adaptation ability. The effectiveness of the proposed method is demonstrated through experiments on abdominal multi-organ segmentation.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0195_paper.pdf
SharedIt Link: https://rdcu.be/dZxdl
SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72111-3_18
Supplementary Material: N/A
Link to the Code Repository
N/A
Link to the Dataset(s)
N/A
BibTex
@InProceedings{Ji_Diffusionbased_MICCAI2024,
author = { Ji, Wen and Chung, Albert C. S.},
title = { { Diffusion-based Domain Adaptation for Medical Image Segmentation using Stochastic Step Alignment } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
year = {2024},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15008},
month = {October},
page = {188 -- 198}
}
Reviews
Review #1
- Please describe the contribution of the paper
The paper UDA framework based on diffusion models that can capture and transfer more inter-domain knowledge to alleviate the domain shift issue.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is well-orgnaized and easy to follow.
- This article utilizes intermediate images from bidirectional step-by-step image projection sequences to align domain discrepancies, and enhances the model’s capability to handle domain shifts through multi-level generative adversarial learning,which is nolvety.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The datasets validated by the experiments are still somewhat limited. It would be beneficial to add experiments using datasets like MMWHS, as a single experiment is not sufficient to verify the method.
- Please rate the clarity and organization of this paper
Excellent
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Do you have any additional comments regarding the paper’s reproducibility?
N/A
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
- I am particularly concerned about the overall process, especially the complexity and efficiency of training the diffusion model, such as training time. Could you provide some experimental data to support this?
- Please add more details for the instruction of Fig1.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Weak Accept — could be accepted, dependent on rebuttal (4)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The overall idea of the article is clear and innovative.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #2
- Please describe the contribution of the paper
This work proposes a novel UDA framework that utilizes diffusion models to capture and transfer an amount of inter-domain knowledge, thereby mitigating the domain shift problem.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- A coupled structure-preserved diffusion model is designed to synthesize intermediate images in multiple steps, making the intermediate image distributions accessible.
- A stochastic step alignment strategy is further developed to align feature distributions, resulting in improved adaptation ability.
- The effectiveness of the proposed method is demonstrated through experiments on abdominal multi-organ segmentation.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The visualization results are somewhat insufficient.
- There are some mistakes about the formatting and spelling. The authors should carefully check them.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Do you have any additional comments regarding the paper’s reproducibility?
See weaknesses.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
See weaknesses.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Weak Accept — could be accepted, dependent on rebuttal (4)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The novelty of the proposed UDA is sufficient, which uses diffusion models to capture and transfer an amount of inter-domain knowledge.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #3
- Please describe the contribution of the paper
The authors, in this paper, tackle the problem of Unsupervised Domain Adaptation (UDA) exploiting diffusion models instead of GANs given the possibility of accessing the intermediate generated data. Staring from the original unpaired images from different domains the authors introduce a first features alignment step exploiting the features extracted by a segmentation network and a discriminator to align the one from the unlabeled target domain to the source one. Then, they also force a consistency between predictions through an additional adversarial-based approach. Finally, exploiting the denoised intermediate generated images they extract the features from the segmentation network to perform the last adversarial alignment. They validate the propose solution on open source datasets in the MRI to CT and CT to MRI configurations.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The proposed solution, starting from unpaired images allows for a bidirectional generation through two diffusion models that opens to the usage of the intermediate images generated which contain semantic information related to the final task.
- The proposed solution is well described in its details and validated against relevant work of the state of the art in the UDA field based on diffusion techniques, demonstrating superior results.
- The presented ablation study further supports the authors’ choice of introducing the different levels of constraints implemented through the different discriminators in a contrastive fashion.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The authors tested the proposed solution only on a combination of imaging techniques, namely CT and T2-SPIR MRI, while the CHAOS dataset also offers T1-inphase and T1-outphase scans that can be consider to better support the achieved results and the usage of the proposed solution in different applicative scenarios.
- The comparison with the state of the art works can be improved (e.g., why SSM is only reported for the MR to CT case?). Stating which models have been retrained (and how) or if some are taken from previous works. Adding the standard deviation would also help.
- No considerations are made on the capabilities of the diffusion models in the generation process. Although the overall proposed solution achieves better results that the sota ones evaluating them would provide a more comprehensive evaluation.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Do you have any additional comments regarding the paper’s reproducibility?
N/A
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
In general, the paper is well-written and easy to follow and apart from few minor details it gives enough details for reproducibility. It is an interesting approach to target the UDA problem exploiting unpaired data. The structure of the manuscript guides the reader and allows for a good understanding of the main blocks of the proposed solution. The experimental results supports the goodness of the proposed UDA techniques, although some improvements can be made. It is useful that the authors report both the “Source” and the “Target” model to provide the reader with the lower and upper bound of the achievable performance. Conversely, the data splitting can be better described in terms of stratification (e.g., patient-wise split?) and it is not mention why the authors decide to employ more CTs than MRI. Additionally, in the comparison with sota works can be detailed more: for example why the SSM is never introduced nor analyzed in the text and it only appears in the MR to CT configuration? It would help to also report standard deviations for the reported results to better appreciate the models stability, although the reviewer can assume that the majority of the numbers are taken from [19] which is not reporting them. In addition, testing the proposed solution on a wider range of acquisition protocols would definitely strengthen the proposed solution and support its applicability. For example, the CHAOS dataset also provides T1-inphase and T1-outphase and there is also the CT-ORG dataset for CT abdominal multi-organ segmentation. Also adding the evaluation of the diffusion models performances would better support the implementation choices made in the creation of the proposed solutions. Consider to enlarge the fonts used in Fig. 1 since some labels are tool small, additionally it is not explained the meaning of dashed vs. solid lines in Fig. 1. Sometimes the authors refers to segmentation network and sometimes they refer to is as segmentation networks; this point remains obscure to the reviewer.
Minors:
- In abstract, in the sentence starting with “This limitation makes it” it should be “This limitation makes them”.
- In Section I, Expand the UDA and GANs acronyms since it is the first time that they are introduced in the corpus of the paper.
- Eq. 2 exceed the template.
- Below Eq. 2 there is an extra closing bracket when introducing the simplified objective of the diffusion model.
- In Section 2.1, there is a “donating” that should be “denoting”.
- In Section 2.2, in the title capitalize the S of stochastic.
- In Section 2.2, the sentence starting with Additionally, we observe seems a bit off.
- In Section 2.2, there are some points where there is x hat with superscript tat instead of tgt.
- In Section 2.2, in the sentence starting with For the final, the reviewer believes it should be T(x_0^src) instead of the sole x_0^src if intended correctly.
- In Table 1, in the MR to CT for the liver DSC the bold result should be the on from [19] 89.1
- In general there is an incoherent capitalization of section titles
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making
Accept — should be accepted, independent of rebuttal (5)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed solution is interesting and relevant to the community. It is based on diffusion and thanks to adversarial techniques achieve good results compared to SOTA work. Finally, it is properly detailed for reproducibility purposes.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Author Feedback
To all reviewers: We thank all reviewers for the affirmation and valuable comments on our work, we have carefully revised the manuscript according to your suggestions. The point-to-point responses are listed below.
To Reviewer#1: 1). We agree with the reviewer’s opinion that using more different modalities of data can better support the generalization ability, it provides constructive insights for our future work. In this paper, we evaluated our proposed method on CT and T2-SPIR MRI for the purpose of fair comparison with existing UDA methods, as the most of existing methods reported their performance on CT and T2-SPIR MRI instead of T1-inphase or T1-outphase in the CHAOS datasets. For a fair comparison, the dataset, the amount of training and test data, and the training-test partitions are consistent with the existing SOTA methods. We partition the dataset based on the individual subjects (patient-wise) to ensure the training and testing subjects are fully non-overlapped.
2). We reported most of the SOTA results from the original paper to ensure the optimal results confirmed by the original author on the benchmark datasets. The SSM only reports the performances of the MR to CT in the original paper. And we didn’t find the publicly available code for the SSM. Since we are unaware of all the implementation details such as all hyperparameters of this work, we did not reproduce and report the result of the CT to MR. And we have added the description of SSM in the section 1 (Introduction).
3). We agree that the capabilities of the diffusion models are important, many works have demonstrated the superior image generation capabilities of the diffusion models over GANs, such as [1][2]. Due to space limitations, in this paper, we mainly focus on the UDA performance. We have evaluated the impact of utilizing intermediate data distribution generated by the diffusion models on the performance of UDA in the ablation experiments.
4). We have enlarged the fonts in Fig.1, added the explanations of dashed and solid lines, and corrected the issue of singular and plural forms in the English words. We have revised the manuscript according to all points listed in the minors.
[1] Ho, Jonathan, Ajay Jain, and Pieter Abbeel. “Denoising diffusion probabilistic models.” Advances in neural information processing systems 33 (2020): 6840-6851. [2] Song, Jiaming, Chenlin Meng, and Stefano Ermon. “Denoising diffusion implicit models.” arXiv preprint arXiv:2010.02502 (2020).
To Reviewer#3: 1). We trained our models in stages, firstly, we trained the diffusion models on the dataset, enabling it to generate MR/CT images. Indeed, the training time for diffusion models is relatively long. We Trained the diffusion models with a batch size of 24 for 100000 iterations. It takes around 33 hours using 6 Tesla V100 GPUs with 32 GB of memory. Then, we use the trained diffusion models to generate corresponding MR/CT images based on the given CT/MR reference images. With the diffusion models, we obtain intermediate image sequences and use these images to train the segmentation network by generative adversarial learning. 2). We have added more explanations for Fig.1.
To Reviewer#4: 1). If the space permits, we will include more visual results of the cases in the new version. 2). We have checked and revised our manuscript carefully to correct the formatting and spelling mistakes.
Meta-Review
Meta-review not available, early accepted paper.