Abstract

While federated learning is the state-of-the-art methodology for collaborative learning, its adoption for training segmentation models often relies on the assumption of uniform label distributions across participants, and is generally sensitive to the large variability of multi-centric imaging data. To overcome these issues, we propose a novel federated image segmentation approach adapted to complex non-iid setting typical of real-life conditions. We assume that labeled dataset is not available to all clients, and that clients data exhibit differences in distribution due to three factors: different scanners, imaging modalities and imaged organs. Our proposed framework collaboratively builds a multimodal data factory that embeds a shared, disentangled latent representation across participants. In a second asynchronous stage, this setup enables local domain adaptation without exchanging raw data or annotations, facilitating target segmentation. We evaluate our method across three distinct scenarios, including multi-scanner cardiac magnetic resonance segmentation, multi-modality skull stripping, and multi-organ vascular segmentation. The results obtained demonstrate the quality and robustness of our approach as compared to the state-of-the-art methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0368_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/i-vesseg/RobustMedSeg

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Gal_Federated_MICCAI2024,
        author = { Galati, Francesco and Cortese, Rosa and Prados, Ferran and Lorenzi, Marco and Zuluaga, Maria A.},
        title = { { Federated Multi-Centric Image Segmentation with Uneven Label Distribution } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15010},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper provides a novel federated learning method for image segmentation with uneven label distribution across client (i.e. not all clients are fully labeled). The authors enhances DatasetGAN with domain specific embeddings, then utilize the learned W-space to guide the training of the local segmentation models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The problem settings is interesting, and need significant attention in research community. For medical imaging, usually there is aboundant amount of images, but with limited annotations. This is especially important in federated learning senario since centralizing data is not feasible and using local data will hinder the performance of the segmentation model.
    2. The experiments in the paper is well-rounded, covering different modalities, different organs/regions and different types of targets (large/small organs, blood vessals).
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Missing comparison with DSL. The comparison is interesting here is that DSL and the proposed method utilize Dataset GAN as a way to represent client data and thus handle the data imbalance problem. The proposed method tackle the domain shift problem on the embedding space, where DSL tackle such problem on image space by generating the samples with the labels. Chang, Q., Yan, Z., Zhou, M. et al. Mining multi-center heterogeneous medical data with distributed synthetic learning. Nat Commun 14, 5510 (2023).
    2. Not clear about dataset they use, especially for the percentage of the labeled data used on the target clients.
    3. The organization and clarity of the paper is limited: For example, I am not sure about how the cycle-consistency loss is calculated since the source image dataset (x^s_j) is not available on the target dataset. Are the $w_s$ being shared across clients? Also, the authors claimed that the learned $W$ space contains doamin-invariant features without further explain this.
    4. It is not clear to the reviewer that how are the other works being implemented in the senario proposed by the authors.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Consider defining the notations clearly: Eqn. 1 contains $G$, however, $G$ is not defined anywhere in the paper. (My understanding is that $G=F$).

    2. Indicate the number of annotated images utilized by each methods in the target set.

    3. Clarify the whole federated learning pipeline, which data is shared across the clients (i.e. the $w$ embeddings). Also clarify the claims including the domain-invariant features either with reference or experiment results.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Lack of clarity of the experiment settings and the method as indicated above.
    2. Missing comparison with similar methods that utilize Dataset GAN for solving domain adaptation(DSL).
  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a novel federated image segmentation approach adapted to complex non-iid settings typical of real-life conditions. Specifically, this approach builds a multimodal data factory shared latent representation across participants. Further, train the local encoder individually through a segmentation branch.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The description of the application background of the method in this paper is clear, and the motivation for the proposed method is straightforward.
    2. The results of the method presented in this paper and other SOTA method proposals have been published, and their effectiveness has been demonstrated.
    3. This paper’s dataset selection is reliable, and the three different datasets effectively represent three possible non-iid scenarios.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. This paper’s core purpose is to solve the multimodal and multiorgan segmentation task. However, the introduction and experimental comparison of multimodal methods seem missing.
    2. The description in the methods section seems unclear. For example, the use of G in Formula 1 is not explained in the context.
    3. As the core module of the method in this paper, the description of the training process of the segmentation branch is insufficient. The use of this module in Fig 1 is insufficient.
    4. There seems to be some confusion in the drawing of Fig. 1. The complex arrows make it difficult for people to understand the structure and process of the method.
    5. Table 1 shows that FedDG’s experimental setting seems unfair. The DG method cannot take advantage of the target domain’s data distribution. Some DA comparison methods may be selected here.
    6. Some of the descriptions in this paper seem a little strange, such as “In a second asynchronous stage” in the abstract.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see the weakness part.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes an effective multi-module, multi-organ segmentation method. The effectiveness has been proven through three different datasets, and this method can effectively solve non-iid problems for various reasons. However, some expressions in this paper’s description of the method are unclear. Hopefully, these issues will be checked and corrected.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    Many thanks for the extra information provided by the authors, however, it didn’t address all my concerns. I will keep my rating.



Review #3

  • Please describe the contribution of the paper

    The paper proposes a new framework for non-iid federated medical image segmentation that comprises 3 steps: (1) the clients create a shared and disentangled data latent representation by training a common multimodal data factory, (2) the clients with annotation masks train segmentation branches (supervised learning) that is then integrated in the factory, and (3) a local data adaptation step is performed for each client using image-to-image translation training. The proposed approach is evaluated against state of the art approaches, including foundation segmentation models (MedSAM, UniverSeg) and federated approaches (FedMed-GAN, FedDG) and achieves significant performance improvements over 3 shifts: multi-scanner, multi-modal, and multi-organ.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very clear to read and follow, and the steps are introduced adequately. The paper tackles a difficut and important problem across multiple non-iid federated settings while requiring few labelled samples for each target domain. The proposed approach significantly outperforms strong baselines for the studied problem.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The last mechanism for local domain adaptation is expensive and does not guarantee convergence of the training with real setting (larger images and limited computation ressources). There may be privacy concerns concerning the sharing of the segmentation branches across all the locale nodes. The segmentation branches are only trained on some node’s data and their weights needs to be shared with the other nodes, which may cause privacy leakage and allow target nodes to extract information from the nodes with available segmentation masks. One concern is about reproducibility. No replication package is provided, and some datasets are anonymized.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Could you elaborate one the privacy concern related to the segmentation branches S_s that need to be made accessible to all nodes? Could you elaborate how the performance of the alternative approaches and yours are sensitive to the number of labelled examples in the target node? The study was conducted using 3 random slices from 3 random volumes. How do the performances change with a slight increase in the slices / volumes?

    Could you elaborate on the generalization of the approach to more than 2 nodes?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed approach is elegant, well motivated and well explained. The problem addressed is important and the performance of the proposed framework are significantly better than previous literature.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The answers of the authors confirm that the good results were not due to a cherry-picked scenario and can generalize to different settings. The authors provided general insights of what happens with more nodes or more labelled slices, but the accepted paper should contain factual elements (numbers) to support this claim.

    The reviewers didn’t however adress my interrogation about the convergence of the local domain adaptation mechanism, and it still seems that this flaw can make the approach not applicable in practice.

    The remarks raised by fellow reviewers can be adressed in the accepted manuscript and are mostly about more explanations (except the DSL approach comparison, but the approach is relatively new at MICCAI submission deadline) I am therefore keeping my original score and supporting acceptance.




Author Feedback

Thanks to R1,3,4 for their feedback. We’re glad they all value the significance of our work, our method’s elegance and significant improvement over strong baselines (R1), our well-rounded experimental coverage (R3), and our algorithm’s effectiveness and clarity (R4).

Local DA is costly (R1) requiring training for each target (T), but this applies to 4/5 compared methods accessing T. Only UniverSeg incorporates T data directly at inference, but lacks adaptation mechanisms for domain gaps, and requires centralizing huge datasets and costly GPUs for training.

Privacy (R1): Sharing both S_s and F may raise privacy concerns, e.g. due to membership inference. Future work involves integrating differential privacy and encryption.

T labels (R1,3) include “three midpoint slices, extracted from three random volumes from D_t” (Sec 4.2). While not in the text, we explored higher values. Fine-tuned methods (nnUNet,SAM,MedSAM) peak in accuracy with a few full labeled volumes. Ours and UniverSeg reach high accuracy quicker, showing good results already with 3 slices, a notable strength.

Nodes > 2 (R1): Our FL step has 3 nodes. In local DA, 1 node acts as source (S), 1 as T, totaling 2 domains like conventional DA. An extension to S nodes > 1 may be an ensemble method adapting T data to each S and averaging softmax segmentations.

Reproducibility & private data (R1,3,4): We missed to state code will be released if accepted. 5/6 datasets are public, largely favoring reproducibility.

DSL (R3): 1) shares modalities across centers and aligned to a common space; 2) doesn’t use DatasetGAN but pix2pix for generation conditioned on segmentation masks; 3) needs all modalities paired with common labels for training; 4) is trained only for generation, then used for downstream segmentation. Instead, we combine unpaired image translation and segmentation in one model, requiring labels for only 1 S modality. We consider DSL similar to FedMed-GAN, so it’s not in our benchmark. We’ll add it in Sec 2.

We omit implementation details (R3) for nnUNet(+DAug) (supervised with S data), and FedMed-GAN (unsupervised with S, T data) as they use standard setups. We’ll add them.

Shared data (R3): Client t uses F to “generate synthetic… samples x^s_j that resemble… their native domain D_s” (Sec 3.2). Embeddings don’t need to be shared and t doesn’t access real S images, but uses fake x^s_j to compute cycle consistency.

The claim about domain-invariant /specific features (R3) in W marks our use of a single latent space, contrasting with most disentangled representations in unsupervised translation that encode style and content separately. We’ll remove it for clarity.

Unclear terms (R4): While complimenting our method’s effectiveness and clarity, R4 flags unclear terms as major concerns, expressing hope they are checked and corrected. R4 points to “multimodal” and “asynchronous”. Multimodal refers to multiple imaging modalities as in the literature: we explore methods’ ability to segment PET, CT, T2w using labeled PDw (Sec 4.1). Asynchronous describes how DA operates independently at each node, opposed to FL that involves all nodes.

The segmentation branch (R4) isn’t our method’s core. It’s used only for local training. Its architecture is from DatasetGAN [28]. The core is F, involved in conditional training via FL, disentangling W, and local DA when integrated with encoders E_s and E_t for image reconstruction and translation and branches S_s and S_t for segmentation. S_s and S_t are trained with segmentation losses (Sec 3.2).

FedDG (R4) is a DG technique so doesn’t access T, but uses a multi-S setup: it’s the only method “trained using all datasets except the target” (Sec 4.2). We use FedMed-GAN with nnUNet as alternative DA. While we evaluate a wide range of methods, we highlight there’s a lack of reproducible methods merging DA and FL in medical segmentation literature, underscoring the novelty of our work.

Typo G=F and better Fig 1 will be in a new version.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper received mixed review. It proposed a new setting with new method. In general, the idea is novel and technical sound. Though some expression of the details should be improved, it is addressable before camera ready. Fundation model is interesting and increasingly important. It is worth to be discussed in MICCAI.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    This paper received mixed review. It proposed a new setting with new method. In general, the idea is novel and technical sound. Though some expression of the details should be improved, it is addressable before camera ready. Fundation model is interesting and increasingly important. It is worth to be discussed in MICCAI.



back to top