Abstract

Single domain generalization (single-DG) for medical image segmentation aims to learn a style-invariant representation, which can be generalized to a variety unseen target domains, with the data from a single source. However, due to the limitation of sample diversity in the single source domain, the robustness of generalized features yielded by existing single-DG methods is still unsatisfactory. In this paper, we propose a novel single-DG framework, namely Hallucinated Style Distillation (HSD), to generate the robust style-invariant feature representation. Particularly, our HSD firstly expands the style diversity of the single source domain via hallucinating the samples with random styles. Then, a hallucinated cross-domain distillation paradigm is proposed to distillate the style-invariant knowledge between the original and style-hallucinated medical images. Since the hallucinated styles close to the source domain may over-fit our distillation paradigm, we further propose a learning objective to diversify style-invariant representation, which alleviates the over-fitting issue and smooths the learning process of generalized features. Extensive experiments on two standard domain generalized medical image segmentation datasets show the state-of-the-art performance of our HSD. Source code will be publicly available.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3893_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Yi_Hallucinated_MICCAI2024,
        author = { Yi, Jingjun and Bi, Qi and Zheng, Hao and Zhan, Haolan and Ji, Wei and Huang, Yawen and Li, Shaoxin and Li, Yuexiang and Zheng, Yefeng and Huang, Feiyue},
        title = { { Hallucinated Style Distillation for Single Domain Generalization in Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15010},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The author introduces a novel single-DG framework, namely Hallucinated Style Distillation (HSD), to generate style-invariant features with consistent contents under style variations within an expanded representation space.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper proposes a decorrelated representation expansion (DRE) method which pushes the redundant channels to explore new activation patterns.
    2. This paper proposes a hallucinated cross-style distillation (HCD) scheme that incorporates the knowledge distillation paradigm into the domain generalization perspective, which distills the commonly-shared information between the original and style hallucinated features.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. More information about how existing single-DG methods enrich style diversity or learn style invariant features should be introduced, including their pros and cons to clarify the difference with this paper.
    2. The authors propose that the method presented here enriches for both style diversity and learned style invariant features. How to judge style diversity? Why did the reference 11 not achieve this goal?
    3. In section 2.1, Add Style Hallucination-related references and clarify the improvement in this paper.
    4. Please confirm the correctness of the Equation (1). Where is h and w in Equation (1)?
    5. Fi,n,j is not introduced in Equation (4)?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. More information about how existing single-DG methods enrich style diversity or learn style invariant features should be introduced, including their pros and cons to clarify the difference with this paper.
    2. The authors propose that the method presented here enriches for both style diversity and learned style invariant features. How to judge style diversity? Why did the reference 11 not achieve this goal?
    3. In section 2.1, Add Style Hallucination-related references and clarify the improvement in this paper.
    4. Please confirm the correctness of the Equation (1). Where is h and w in Equation (1)?
    5. Fi,n,j is not introduced in Equation (4)?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The introductions of state-of-the-art papers are missing. Mistakes in Equations.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper describes a novel deep learning architecture that is used to train domain robust models by having only one target domain available. The architecture is evaluated on two publicly available datasets from two domains (fundus images and prostate MRI) and benchmarked against a comprehensive set of baseline architectures.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper describes a novel architecture for single-domain generalization.
    • The authors test their architecture on two different publicly available datasets from two different domains.
    • The authors report their results with mean and standard deviations.
    • Through an ablation study, the authors highlight the effectiveness of different parts of the proposed approach.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The authors use two different datasets, both containing data of different domains. For each dataset, the authors select one domain as the source domain and the performance is reported on the other domains. As domain generalization is not a symmetrical problem, and the generalization ability from domain a to domain be is not necessarily as high as the generalization ability from domain b to domain a. It might be insightful how the models would perform if the other domains would be selected as the source domains.
    • The authors report means and standard deviations in their results. However, the paper lacks a description of how these values were derived. Are they the result of repeated training or N-fold cross validation?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    Only a minor comment: I think the authors might want to use the anonymous github repo for their next paper submission, as this provides the possibility to incorporate the source code into the review.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • The authors should perform an experiment in which they permute the source domains of the two datasets used and show how this affects performance on the respective target domains. - This could be used to show that the authors’ approach works regardless of the source domain and give a higher comfort in the expectation of the method to generalize.
    • The authors should clarify how they derived the means and standard deviations for the respective experiments.
    • In addition, the authors should clarify whether the benchmarks were implemented and the results reproduced for this paper, or whether they were taken from some related work.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has a clear novelty, and the authors validated their method on two different publicly available datasets from two different domains. In addition, they compared their method with an extensive set of baselines. There are only points that would improve the paper, which the authors can address in the rebuttal.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    I am fine with how the authors have addressed my questions in the rebuttal.



Review #3

  • Please describe the contribution of the paper

    The key contribution of this paper is to generate style-invariant features that are robust to variations across unseen target domains, using data from only a single source domain. This is achieved through a combination of techniques: random style hallucination to enhance style diversity, decorrelated representation expansion to simulate out-of-distribution input impacts, and hallucinated cross-style distillation to extract consistent structural information despite style variations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The formulation of HSD is novel as it addresses the challenge of limited sample diversity in a single source domain by hallucinating samples with random channel statistics, thus expanding the style diversity without the need for multiple source domains.

    2.The HSD method creatively utilizes the single source domain data by generating style-hallucinated versions of the training samples. This original way of data augmentation allows the model to learn from a more diverse dataset, which would otherwise be impossible with a limited dataset from a single domain. Clinical Feasibility:

    3.By focusing on the practical scenario where only data from a single domain is available, the paper addresses a common challenge in clinical settings where data sharing across institutions is often restricted due to privacy concerns. The proposed method’s ability to generalize well to unseen domains makes it clinically feasible and relevant for real-world applications.

    4.The paper provides a particularly strong evaluation of the proposed method through extensive experiments on two standard domain generalized medical image segmentation datasets. The use of Dice coefficient and Hausdorff Distance as evaluation metrics, along with the comparison with state-of-the-art methods, demonstrates the effectiveness of HSD in a rigorous and convincing manner.

    5.The application of HSD to medical image segmentation is novel, especially in the context of handling style variations across different imaging conditions and devices. This is significant as it can potentially improve the accuracy and reliability of medical diagnoses and treatments by providing better segmentation results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.Firstly, the paper does not elaborate on the loss L_SRD mentioned in equation 7. Could it possibly be referring to the L_DRE in Equation 3? 2.In an attempt to generalize within a single domain, the paper proposes feature transformation of each layer’s feature map through random sampling of mean and variance. This is done to alter redundant activation values. However, this transformation could also potentially lead to loss of features. 3.Apart from using common segmentation loss, the paper introduces two additional losses: L_DRE and L_HCD. The former aims to constrain cross-channel feature similarity, thereby reducing redundancy in feature channels. The latter uses KL divergence to measure the data distribution before and after transformation. While L_DRE tends to differentiate features across different channels, L_HCD encourages the same feature maps at the end of the encoder. Could this lead to conflicting network updates during training? 4.The paper mentions that the HSD method reduces redundancy in activation locations, but there is no visual proof provided through intermediate layer features to demonstrate that the proposed method captures a broader range of features.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The visualization of intermediate feature layers could be added to further determine the effectiveness of the method in eliminating activation location redundancy and capturing a broader range of features.

    2. Since training is conducted on only one domain and most segmentation datasets have limited data that may not encompass a wide range of features, would it be possible to pre-train the proposed method on classification tasks for better domain generalization?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The generalization method described in the paper does not utilize prior information or other datasets. 2.The idea of this paper is similar to contrastive learning, aiming to reduce redundancy in attention regions by transforming feature maps.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We express our gratitude that all the reviewers acknowledge the novelties and contributions. Some details on the experiments and methodology details are clarified.

R3Q1: Other source domain? R: 1) We would like to clarify, in both experiments, using Domain A as source domain follows the prior evaluation protocol [11]. 2) Following your suggestion, we further select Domain F as source domain on prostate dataset. Our method yields 81.6% Dice, much better than [15] baseline with 77.5% Dice. Q2: How values derived? R: Following the protocols of prior works, the results are derived from five-time repeated training. We will add these details in the revised version. Q3: How results reported? R: The outcomes of [2], [6], [11], [12], [14], [17], [18] and [21] are directly cited from [11], while the outcomes of [15], [20] are reproduced by us. We will add these details in the revised version.

R4Q1: Loss typo. R: We will correct it as L_DRE (defined in Eq.3). Q2: possible to lose feature? R: The proposed method consists of two branches (Fig.1), which learn the original and transformed features respectively. In other words, the original features are maintained in the first branch without potential loss. Q3: possible loss conflict? R: We would like to kindly raise the reviewer’s attention that, L_DRE penalizes the channel-wise similarity which helps to enhance the representational capacity of features. L_HCD focuses on the semantic consistency before and after hallucination, while the features are projected to a low-dimension space. We believe that the enhanced representational capacity plays a facilitating role in learning domain-invariant features. Therefore, there is a synergistic effect between these two losses, which does not introduce conflicts during training. Q4: visual evidence. R: Many thanks for your valuable suggestion. As the rebuttal system could not upload any visual results, we briefly describe the visualization process and the outcomes as follows. We extract the feature map from any block of the image encoder, and implement the global average pooling. Afterwards, we compute the self-correlation matrix and visualize. Compared the baseline and the proposed HSD, the channel-wise correlation on the off-diagonal regions is significantly alleviated, which reduces the feature redundancy. These visual results and outcomes will be included in the revised version.

R5Q1: More discussion on existing SDG methods. R: We will enrich the discussion of these methods in the revised version. In general, these prior works usually implement naïve data augmentation before training the model to enrich the style diversity. Instead, our method is able to introduce random styles that are integrated in the learning pipeline and constrain their similarity, which is able to learn more generalized medical representation despite domain variation. Q2: Define style diversity. Why [11] can’t? R: 1) According to the style hallucination reference (which we will add) in the machine learning community, the style can be quantified by the mean and standard deviation of per-domain image features (computed as Eq.1). The more varied distribution of them, the more style diversity. 2) [11] focuses on learning shape-invariant representation despite the domain shift, which does not assure enough style diversity for model learning and do not constrain the cross-domain content representation. Q3: Add style hallucination reference and clarify improvement. R: We will add the style hallucination reference accordingly. Existing references need to first extract and then inject the style in the target domain to the source domain. In contrast, our work can inject arbitrary and random styles for hallucination, which significantly enriches the style diversity. Q4: Equation definition. R: h and w refer to the height and width of the feature map. F_{i,n,j} refers to the feature map from the j^{th} channel in F_{i,n}. i refers to the i^{th} block of the image encoder.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper proposed a data augmentation method based on feature style hallucination, which falls under the style augmentation-based method for single domain generalization. The paper received mixed ratings (1 accept, 1 weak reject, 1 weak accept). The main issue is that relevant works are not cited and discussed (R5), and thus the contribution and novelty is not that clear.

    After carefully reading the paper, the rebuttal, and the three reviews, the concern still remains. The AC further noticed that the majority of baseline methods presented in Tables 1 and 2 are test-time adaptation based. The authors did not compare their method to relevant state-of-the-art single domain generalization approaches in the same venue, such as RandConv (ICLR 2021), MixStyle (ICLR 2021), and MaxStyle (MICCAI 2022).

    Given the missing literature review of these relevant methods as well as insufficient comparative studies in the experiments, The AC believes the paper is not ready for publication.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper proposed a data augmentation method based on feature style hallucination, which falls under the style augmentation-based method for single domain generalization. The paper received mixed ratings (1 accept, 1 weak reject, 1 weak accept). The main issue is that relevant works are not cited and discussed (R5), and thus the contribution and novelty is not that clear.

    After carefully reading the paper, the rebuttal, and the three reviews, the concern still remains. The AC further noticed that the majority of baseline methods presented in Tables 1 and 2 are test-time adaptation based. The authors did not compare their method to relevant state-of-the-art single domain generalization approaches in the same venue, such as RandConv (ICLR 2021), MixStyle (ICLR 2021), and MaxStyle (MICCAI 2022).

    Given the missing literature review of these relevant methods as well as insufficient comparative studies in the experiments, The AC believes the paper is not ready for publication.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper proposes an innovative way to enrich the intensity style by halluciation, which improves the performance of single-domain generalisation. The method is supported by extensive experimental evaluation studies on two datasets and against ten competing methods including both CNN and ViT based methods.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper proposes an innovative way to enrich the intensity style by halluciation, which improves the performance of single-domain generalisation. The method is supported by extensive experimental evaluation studies on two datasets and against ten competing methods including both CNN and ViT based methods.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The reviewers have provided comprehensive feedback on the paper, highlighting its novel contributions, strengths, and areas for improvement. Reviewer #3 acknowledges the paper’s novelty and thorough evaluation while suggesting experiments to enhance the method’s robustness further. Reviewer #4 commends the paper’s innovative approach and clinical relevance but points out specific technical aspects that require clarification and potential enhancements. Reviewer #5 appreciates the paper’s contributions but raises concerns about clarity, organization, and the need for additional references and corrections in equations. Although the authors did not compare their method with all possible SOTA methods, they have performed comparison with a number of key methods, including those recently proposed (e.g., FeedFormer, 2023). Overall, the reviewers express confidence in the paper’s potential with some revisions, indicating its acceptability with the necessary improvements.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The reviewers have provided comprehensive feedback on the paper, highlighting its novel contributions, strengths, and areas for improvement. Reviewer #3 acknowledges the paper’s novelty and thorough evaluation while suggesting experiments to enhance the method’s robustness further. Reviewer #4 commends the paper’s innovative approach and clinical relevance but points out specific technical aspects that require clarification and potential enhancements. Reviewer #5 appreciates the paper’s contributions but raises concerns about clarity, organization, and the need for additional references and corrections in equations. Although the authors did not compare their method with all possible SOTA methods, they have performed comparison with a number of key methods, including those recently proposed (e.g., FeedFormer, 2023). Overall, the reviewers express confidence in the paper’s potential with some revisions, indicating its acceptability with the necessary improvements.



back to top