Abstract

The field of medical image segmentation is challenged by domain generalization (DG) due to domain shifts in clinical datasets. The DG challenge is exacerbated by the scarcity of medical data and privacy concerns. Traditional single-source domain generalization (SSDG) methods primarily rely on stacking data augmentation techniques to minimize domain discrepancies. In this paper, we propose Random Amplitude Spectrum Synthesis (RASS) as a training augmentation for medical images. RASS enhances model generalization by simulating distribution changes from a frequency perspective. This strategy introduces variability by applying amplitude-dependent perturbations to ensure broad coverage of potential domain variations. Furthermore, we propose random mask shuffle and reconstruction components, which can enhance the ability of the backbone to process structural information and increase resilience intra- and cross-domain changes. The proposed Random Amplitude Spectrum Synthesis for Single-Source Domain Generalization (RAS^4DG) is validated on 3D fetal brain images and 2D fundus photography, and achieves an improved DG segmentation performance compared to other SSDG models. The source code is available at: https://github.com/qintianjian-lab/RAS4DG.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/4012_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/4012_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Qia_Medical_MICCAI2024,
        author = { Qiao, Qiang and Wang, Wenyu and Qu, Meixia and Su, Kun and Jiang, Bin and Guo, Qiang},
        title = { { Medical Image Segmentation via Single-Source Domain Generalization with Random Amplitude Spectrum Synthesis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15009},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This work addresses the problem of domain shift, and it aims to address it via single-source domain generalization (DG). To achieve this, they propose a data augmentation strategy in the spectral domain. The method, denoted Random Amplitude Spectrum Synthesis (RASS), simulates distribution changes in the frequency domain by applying amplitude-dependent perturbations. Additionally, they random mask shuffles are applied after reconstructing the augmented data.

    The method is evaluated in three datasets FeTA2021 IOSTAR, and LES-AV (2D and 3D) showing results which are comparable to the state-of-the-art.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The problem that this work addresses is very relevant
    • It relies on a single-source domain, which is an advantage w.r.t. competitors such as FedDG.
    • Reported results are state-of-the-art
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The idea of data augmentation in the spectral domain has been previously explored (for instance, in FedDG, which reports similar results in Table 1). Hence, the novelty is limited.
    • The paper is difficult to read at parts and some contributions are not clearly explained. For example, the reconstruction (RSD) seems to play an important role in the final performance, as reported in Table 2, but its principle is not clearly explained in the paper (Section 2.2).
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Figure 1 does not add much value and does not seem to be consistent with the text (LF varies across Feta and atlases). Perhaps it can be omitted.
    • The text in section 2.2 could be improved. It is recommended to revise it and restructure it.
    • Foundation models are a novel alternative to DG. Please position your work wrt them.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Reject — should be rejected, independent of rebuttal (2)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of spectral data augmentation is not new. Other contributions of this work, particularly reconstruction design, are not clearly explained. The modifications needed to make this manuscript publishable go beyond what is possible within a rebuttal.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    The authors did a good job clarifying doubts. I still believe that addressing the different points in the rebuttal would require a second revision of the paper, which motivates the new score provided. Nonetheless, I would not oppose an acceptance recommendation.



Review #2

  • Please describe the contribution of the paper

    This paper proposed a data synthesizing method on frequency domain through FFT and IFFT, and achieve single-source domain generalization.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The method is very interesting and novel. The paper uses FFT to decompose image, modify the amplitude spectrum, and use IFFT to transform it back to get sythetic data.

    2. The method achieves slightly better performance than SOTA methods on both 2D and 3D datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper lacks mathematical details on why perturbing the amplitude spectrum in this manner helps with domain generalization. More theoretical grounding would strengthen the motivation.

    2. The hyperparameter choices for RASS (e.g., α, β, γ) are not well-justified. It is unclear how sensitive the method is to these values and how they were chosen. Providing some intuition or empirical analysis would be helpful.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Reproducible with the details provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. To strengthen the theoretical foundation, consider providing more mathematical intuition on why perturbing the amplitude spectrum in the proposed manner helps with domain generalization. Connecting it to existing literature on frequency-based analysis of domain shifts could be beneficial.

    2. Discuss how the hyperparameters for RASS were chosen and analyze the method’s sensitivity to these values. This would provide readers with a better understanding of how to apply RAS4DG in practice.

    3. Evaluate RAS4DG on additional large-scale, multi-center datasets with significant domain shifts to further demonstrate its generalization capability. This would increase confidence in the method’s real-world applicability.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper introduces a very interesting frequency-based approach for data sythesis, which enhances downstream task. Despite the limited performance boost and discussion on theoretical grounding, I recommend boardline acceptance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The rebuttal partly addressed the weakness mentioned, I recommend acceptance.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a data augmentation strategy called Random Amplitude Spectrum Synthesis (RASS) in order to achieve single source domain generalization (SSDG) in the context of deep learning medical image segmentation. In contrast to standard DG learning schemes, SSDG aims to ensure out-of-domain generalization by using data from only one source dataset. The approach consists of applying perturbations/noise in the amplitude spectrum of medical images to simulate inter-domain variability. The authors also integrate random mask shuffling during training and spatial/channel reconstruction into their network bottleneck to further improve segmentation performance. The proposed method (RAS4DG) is validated on 3D fetal brain images (source: atlases, target: FeTA dataset) and 2D fundus photography (source: DRIVE, target: IOSTAR and LES-AV datasets), and achieves improved domain generalization (DG) segmentation performance compared to other SSDG models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of this paper lies in its RASS data augmentation strategy to ensure out-of-domain generalizability, which is novel and highly relevant to the field of medical image segmentation given the burden of creating annotated training datasets. The authors demonstrate the applicability and generality of their method by performing experiments on two different anatomies (fetal brain and eye fundus). The proposed method is supported by a fair evaluation, including a comparison with existing SSDG pipelines, and an ablation study to show the impact of each component of the proposed framework. The paper is also well organized and written.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness of this paper is the incomplete description of the method, especially its random mask shuffle and reconstruction components. The paper also lacks a statistical analysis of the results which would further support the advantages of the proposed method over compared SSDG schemes. Minor details are also missing in the experiment section (see comments).

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The datasets used in this paper are publicly available and the authors claim to release the source code upon acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Methods:

    • Section 2.1 should include a reference to FedDG (Liu et al. [8]), which also proposes to modify amplitude spectrum information in the context of domain generilizability. How does the proposed method differ from the work of Liu et al.?
    • Algorithm 1 seems redundant with Figure 2, as it does not add any new information. I would suggest leaving the algorithm description in the supplementary material.
    • In section 2.2, is the shuffling strategy also applied to the segmentation mask ? Or are only the image pixels shuffled? And are the pixels present in the shuffled regions included in the loss function?
    • The description of the spatial and channel reconstruction is not very clear and the diagram in Figure 2 does not provide any helpful information. In particular, what is the “separate-transform-merge” operation mentioned in Section 2.2, a reference might be useful.

    Experiments and results:

    • In the implementation details, the authors should clarify the term ‘ploy’ learning strategy or provide a reference.
    • Table 1, the authors should comment on the difference in segmentation performance on normal vs. abnormal fetal brain in the FeTA dataset. In general, the authors should provide more comments on Table 1, e.g. is it expected that the MSDG methods will outperform the SSDG approach?
    • A commentary/analysis of Figure 3 is currently missing from the paper.
    • Segmentation performance is only assessed using the Dice metric. I would suggest including additional metrics (e.g., Haussdorf distance, sensitivity, relative absolute volume error, etc.) to provide a more detailed description of the method performance.
    • It would be beneficial to include a statistical analysis of the results to further demonstrate the superiority of the proposed method.

    Future work:

    • I would suggest testing other combinations of source and target domains (i.e., source IOSTAR, target: LES-AV and DRIVE) to assess whether the source domain has an impact on out-of-domain generalizability.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed SSDG method is novel and its application in out-of-domain generalizability is highly relevant to the field of medical image analysis. The work is supported by a fair evaluation and comparison with existing DG methods. However the methodology section lacks some key information and the work suffers from the absence of any statistical analysis of the results.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    The authors have addressed most of my concerns, but the authors have not convincingly answered R3’s comment about the mathematical intuition for why perturbing the amplitude spectrum in the proposed way helps with domain generalisation.




Author Feedback

We appreciate all the reviewers for their valuable comments and suggestions. We are encouraged that they find our motivation practical (R1, R4), our representation clear and well-organized (R3, R4), our method novel and effective (R1, R3, R4), and our experiments extensive (R4). We address the reviewers’ concerns below. To R1: Thanks! (1) Novel limitation: FedDG [8] pre-banks multiple amplitudes and then interpolates these amplitudes. This bank requires collecting amplitudes from different clients resulting in less flexibility. In Fig.1 of our work, we show the difference between high- and low-frequencies of the amplitude spectrum in 2D and 3D datasets. This will help the reader to clearly see that the variance inconsistency between high- and low-frequencies and introduce our method. Unlike [8], our method is less data-dependent and more flexible. The comparison with [8] is unfair as far as performance is concerned. In Table 1, while [8] generalizes experimental results from multiple domains, our RAS4DG achieves superior performance using only one source domain. (2) About random mask shuffling (RMS): We introduce two key innovations in Section 2.2 (RMS and RSD). RMS is more robust to noise and local changes for images. (3) RSD’s explanation: RSD is used to enhance the capture of stylistic information after RASS and the representation of features within the region shuffled after RMS. The RSD consists of spatial and channel reconstruction. For spatial reconstruction, features are categorized into high- and low-information groups based on weights and cross-fused. For channel reconstruction, we use a ‘separate-transform-merge’ strategy: channels are divided, with one part enriched via depth-wise and point-wise convolutions, and the other part supplemented with point-wise convolutions. The enriched and supplemented features are merged using the SKNet (CVPR 2019). We promise to clarify the details in the final version. (4) Compared DG: Our RAS4DG not only proposes two strategies, RASS and RMS, but also uses plug-and-play RSD to add the bottleneck layer. Due to guidelines, the experimental results are not allowed to be provided here. To R3: Thank you very much for your recognition! (1) Motivation: The answer can be found in R1 (1). (2) Mathematical details and hyperparameters in RASS: For space reasons, we directly obtain the amplitude and phase spectra. Our equation 1 and 2 are the key to RASS, where δ is used for random perturbations for different amplitude frequencies. δ[m,n,p] is sampled from the Gaussian distribution of [1,σ^2[m,n,p]]. We further discuss the selection of hyperparameters α, β, and γ in function σ. Section 2.1 noted that linear scale perturbations may not be sufficient to distinguish different frequency components. We use an exponential function to increase the level of perturbation. Experiment visualization and hyperparameter results are provided in Fig. 1 and Table 2 of the Supplementary Material. To R4: Thanks! (1) Method: We mention the differences from [8] in R1 (1) and elaborate on Section 2.2 in R1 (2) and (3). It is worth noting that RMS only shuffles image pixels. (2) Experiments and results: “Ploy” learning rate policy, where lr = lrinit ×(1−epoch/epochtotal)^0.9. It is used in many medical image tasks, such as CCSDG and ASC (MICCAI 2023). The difference in results between normal and abnormal samples in Table 1 may be due to unbalanced data distribution and inconsistent feature representation. We also report experimental results on DA, MSDG, and SSDG methods. DA and MSDG methods use more data during training, and theoretically provide better results than SSDG methods. But we can see that our RAS4DG is more effective compared to these methods. Figure 3 shows a qualitative analysis of the results. Our model outperforms other methods in terms of both local details and structural integrity. For the more obvious differences we have labelled them using boxes. Thank you for your advice on statistical analysis.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper introduces random amplitude spectrum synthesis for single-source domain generalization.

    R1 raised that this strategy is similar to the one used in the FedDG work which is in a federated setting. I understand this point regarding novelty. R1 gave constructive feedback on improving the structure and clarity.

    Consdering the joint opinions, I vote for accept given that the introduced techniques are relative novel for domain generalization and sufficient experiments are presented.

    The authors should address as much they can in the final version. ‘Synthesize’ should be ‘Synthesis’ in the title.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper introduces random amplitude spectrum synthesis for single-source domain generalization.

    R1 raised that this strategy is similar to the one used in the FedDG work which is in a federated setting. I understand this point regarding novelty. R1 gave constructive feedback on improving the structure and clarity.

    Consdering the joint opinions, I vote for accept given that the introduced techniques are relative novel for domain generalization and sufficient experiments are presented.

    The authors should address as much they can in the final version. ‘Synthesize’ should be ‘Synthesis’ in the title.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top