Abstract

Deep Learning (DL) has revolutionized medical imaging, yet its adoption is constrained by data scarcity and privacy regulations, limiting access to diverse datasets. Federated Learning (FL) enables decentralized training but suffers from high communication costs and is often restricted to a single downstream task, reducing flexibility. We propose a data-sharing method via Differentially Private (DP) generative models. By adopting foundation models, we extract compact, informative embeddings, reducing redundancy and lowering computational overhead. Clients collaboratively train a Differentially Private Conditional Variational Autoencoder (DP-CVAE) to model a global, privacy-aware data distribution, supporting diverse downstream tasks. Our approach, validated across multiple feature extractors, enhances privacy, scalability, and efficiency, outperforming traditional FL classifiers while ensuring differential privacy. Additionally, DP-CVAE produces higher-fidelity embeddings than DP-CGAN while requiring $5{\times}$ fewer parameters.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0970_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/myng15/federated-dp-cvae

Link to the Dataset(s)

Camelyon17-Wilds: https://wilds.stanford.edu/datasets/ Abdominal CT: https://medmnist.com/

BibTex

@InProceedings{DiFra_EmbeddingBased_MICCAI2025,
        author = { Di Salvo, Francesco and Nguyen, Hanh Huyen My and Ledig, Christian},
        title = { { Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15973},
        month = {September},
        page = {141 -- 151}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This work introduces a federated Conditional VAE (CVAE) model to (1) address data scarcity, (2) provide greater downstream flexibility by generating (private) synthetic global data, and (3) reduce communication overhead in the FL system.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The manuscript is well organized and the concepts are well explained.

    2. I am convinced that the differential privacy mechanism makes the synthetic global data generation safe for the FL system. The merits of such a system are clear as it provides for greater downstream flexibility, allowing clients to tune their local models for specific tasks while making use of the global data.

    3. DP-CVAE requires 5x fewer parameters than DP-CGAN, which is clearly desirable in a federated system.

    4. The authors elect to experiment on medical imaging data, rather than standard benchmark datasets. This effort is commendable as it more strongly demonstrates the model’s effectiveness in real-world scenarios.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. In the introduction, the authors claim that “data-sharing via privacy preserving synthetic data generation” reduces communication overhead, but do not elaborate on how this is possible. This claim is not immediately obvious since DP-CVAE still communicates decoder parameters to the global server, so it would appear to be as demanding as standard FedAvg.

    2. While the theoretical methodology is well grounded, the experimental procedure is lacking. (a) The authors claim that the CVAE enables clients to train task-specific models by balancing their local and the synthetic global dataset. But they only provide an example of image classification. How can the local and global dataset be balanced in other downstream tasks? (b) The authors only explore one heterogeneous setting. A thorough analysis of DP-CVAE’s performance across multiple non-IID configurations would provided a clearer picture of the model’s performance. (c) In terms of DP FL methods, the authors only compare against one model (DP-CGAN). In the introduction, they state that other work have applied VAE and CVAE to the federated setting with benchmark datasets such as MNIST. Why not adapt these methods to the medical imaging data or use MNIST with DP-CVAE to compare against more baselines?

    3. The results in Table 1 are not entirely convincing. While it is true that DP-CVAE trades blows with DP-CGAN despite having 5x fewer parameters, the authors dismiss the lower balanced accuracy (compared to FedAvg) because of its improved standard accuracy over the baselines. I would argue that we cannot simply dismiss the loss in performance from the balanced accuracy because it is designed to provide a more reliable measure of the model’s performance when the dataset is imbalanced.

    4. The authors do not provide an ablation study of the hyperparameter $\lambda_m$, which balances the contribution of the local data with the synthetic global data in classification predictions. While not necessarily required, an analysis of such a parameter would provide insights into the quality of the learned CVAE decoder.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the methodology is strong, I do not feel that the experiments and analysis in the manuscript are strong enough to accept this paper.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    While I would still be interested in a more detailed discussion of VAE and CVAE, the authors adequately addressed my major concerns with the paper. I am convinced that DP-CVAE substantially reduces communication overhead and that the BACC performance is satisfactory compared to the baseline approaches. Finally, I appreciate that the authors are willing to include a more detailed discussion of the effects of lambda in the final version. Overall, I am happy to accept this paper.



Review #2

  • Please describe the contribution of the paper

    The authors present a method where clients collaboratively train a Differentially Private Conditional Variational Autoencoder (DP-CVAE) to model a global, privacy-preserving data distribution. This shared generative model is then used to support a range of downstream tasks in a federated learning (FL) setting.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The use of a DP-CVAE to model a global data distribution without sharing raw data aligns well with privacy-preserving goals in FL.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The proposed approach squarely falls into a well-established line of work in synthetic data generation for FL, yet the authors omit critical comparisons to existing models:

    • Federated Learning with GAN-Based Data Synthesis for Non-IID Clients: This work proposes client-side GANs that generate differentially private synthetic data, which are then used to construct a global dataset. This is very similar in purpose and privacy guarantees to the proposed DP-CVAE model.

    • PerFED-GAN: Here, personalized GANs enable FL participation without architecture or parameter sharing. Like DP-CVAE, this allows synthetic-data-based collaboration and supports personalization. The authors should clarify what advantages DP-CVAE offers over the personalized, model-agnostic nature of PerFED-GAN.

    • FedSR: While not a generative model per se, FedSR also aims at learning representations that generalize well across domains in a privacy-preserving way. It uses regularization techniques rather than generative modeling, but the goal—building a global representation that abstracts essential data features—is highly aligned with DP-CVAE’s objectives.

    • The rise of LLMs for data augmentation and domain generalization brings in new baselines that are hard to ignore. GPT-FL, for instance, uses generative pre-trained models to synthesize diversified synthetic data and train downstream tasks in a federated setup. The authors should at least position their approach relative to such transformer-based models and justify the choice of using a CVAE over large-scale pre-trained generative models.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed DP-CVAE framework offers an interesting angle on privacy-aware federated data modeling, but its contribution is incomplete without positioning it within the broader context of synthetic-data-based FL approaches. The method is conceptually similar to a range of prior work—GAN-based federated synthesis, personalized federated generative models, representation learning frameworks, and LLM-based data generation—but fails to reference or compare to any of them.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    According to my initial concern regarding the lack of comparison with closely related methods, I acknowledge the authors’ feedback and their rationale that comparing with GAN-based synthetic approaches in FL might be less relevant. They state: “We reiterate that our goal is to generate comparably low-dimensional embeddings, not images, demonstrating that GANs are unnecessarily heavy for this task. “ While I understand this position, I still believe it would be beneficial to include at least one comparison with a GAN-based method in terms of performance and resource utilization, to substantiate the claim that GANs are indeed “unnecessarily heavy” for the proposed task.

    However, my primary concern lies in the lack of comparison with other privacy-preserving generative modeling approaches that also operate in the latent space. The authors themselves acknowledge that their work is related to methods like FedSR (FedSR: A Simple and Effective Domain Generalization Method for Federated Learning) and state: “Therefore, our approach is complementary to FL methods (e.g., FedSR, FedProx), making this research direction, in our opinion, particularly relevant for the community.” Despite this, the manuscript does not include a comparison with FedSR or any similar method that explicitly targets latent-space generative modeling.

    So far, the comparisons are limited to FedAvg, FedProx (which is not specifically focused on generative modeling in the latent space), and FedLambda, which is the authors’ own adaptation of kNN-Per and not an established baseline in the community.

    Therefore, I believe the paper currently lacks crucial comparisons with closely related methods - particularly those performing generative modeling in the latent space - which are necessary to demonstrate the claimed performance gains. While improvements over FedAvg and FedProx appear reasonable, they do not convincingly support the novelty or superiority of the proposed approach in the relevant subdomain.



Review #3

  • Please describe the contribution of the paper

    The authors propose a pipeline which allows to utilise the information of multiple sites for downstream trainings while providing theoretical guarantees on the privacy protection of each respective site.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Utilising a global foundation model as a feature extractor and combining this with differential privacy is, in my opinion, a very smart way to tackle the challenge of collaborative privacy-preserving training. Overall, I’m convinced that this paper is an important contribution and will further promote the acceptance of mathematical privacy guarantees in the MICCAI community.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The results are already convincing, but could be even more insightful if other parameters were considered too. Most importantly, I think the effect of the privacy budget to investigate the privacy-utility trade-off would be very interesting, as in your approach you have chosen a very conservative privacy budget of (1.0, 10-4). Typical budgets range from 1-10, but some works have shown that even much higher budgets do still provide substantial protection against many attacks. Moreover, data fidelity metrics are known to be contradictory. I think not solely relying on the Wasserstein distance would strengthen the claims.

    What is a bit unclear to me, and maybe you could discuss this, is why the fidelity of DP-CVAE is much better compared to DP-CGAN, but in terms of downstream performance, they are on par.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors can convincingly demonstrate that their method allows to facilitate the information of distributed datasets for training downstream classification tasks while providing theoretical privacy guarantees. This is an important problem for the field and will further promote the acceptance of mathematical privacy protection within the MICCAI community.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    I think the author’s have done a reasonable job in addressing the concerns of all reviewers and given the constraints that MICCAI imposes on papers and rebuttals. While I understand the third reviewer’s concerns for a more thorough empirical evaluation I think the current evaluation is already a clear indication that there are settings where the proposed methodology offers an advantage either in performance or communication overhead or both.

    Overall, I still believe that this paper is a valuable addition to the MICCAI community.




Author Feedback

We thank all reviewers for their feedback and are pleased that our work was recognized as a smart approach to privacy-preserving federated learning (FL) (R1) and that the use of a global foundation model (FM) was highly appreciated (R1). We are grateful that the methodological clarity and organization were acknowledged (R3), along with the flexibility (R3) and safety (R2, R3) of our approach. We also appreciate the recognition of our model’s efficiency (R3) and the use of realistic datasets (R3) with convincing results (R1).

We believe that addressing FL in the latent space of FMs, while ensuring differential privacy (DP), is an important contribution with the potential to make secure data sharing significantly more efficient and practicable. We are encouraged by R1’s recognition of our method’s significance to the MICCAI community.

While we appreciate the predominantly strong positive remarks from R3, we were surprised by the seemingly inconsistent low score (by R3). We acknowledge that the main weakness identified by R2 relates to our positioning and are confident that major concerns will be addressed below.

(R3) A. Efficiency, B. Generalization, C. Non-IID, D. Lambda: A) While the reviewer is correct that the proposed approach still relies on parameter sharing, it is fundamental to appreciate that our method only requires sharing the decoder (3-layer MLP) instead of a classifier with millions of parameters. B) As correctly noted, our method enables clients to train task-specific models by balancing local and global data. While we evaluate disease classification, the generated data is reusable across any task (e.g., OOD, anomaly detection), unlike traditional FL approaches tied to a single task. C) We evaluated DP-CVAE under one non-IID (heterogeneous) setting, achieving substantially higher ACC than baselines. While those gains did not translate to BACC, we did not, as stated by R3, dismiss this fact, but discussed it as part of the results and limitations. Notably, those differences in BACC are not significant (within 1 std), and baseline methods, being non-DP, solve a simpler task. D) Lambda, weighting the local data (per client), yields the best ACC in the range 0.4-0.7, indicating a balanced local-global data mix and thus high utility of the synthetic data. We appreciate the opportunity to highlight this and will include it in the final version.

(R2) Positioning: We reiterate that our goal is to generate comparably low-dimensional embeddings, not images, demonstrating that GANs are unnecessarily heavy for this task. As such, an empirical comparison to widely used GAN- or LLM-based synthetic data approaches generating data in input space is not appropriate and beyond the scope of this work. Our key contribution lies in shifting the federated paradigm from downstream model sharing to privacy-preserving generative modeling in the latent space. By operating on latent space, we carry the expressiveness of a FM with a lightweight DP-CVAE. Therefore, our approach is complementary to FL methods (e.g., FedSR, FedProx), making this research direction, in our opinion, particularly relevant for the community.

(R1) Privacy budget (eps): Performance remained stable across a range of privacy budgets. We hypothesize this is due to the compressed information content of embeddings compared to raw images, making the model less sensitive to eps. While this observation underscores the potential of moving the problem into latent spaces (better performance-privacy trade-off), we chose not to include this intuition in the manuscript as it warrants further experiments.

(R1) Fidelity vs ACC: The comparable downstream performance of DP-CVAE and DP-CGAN despite fidelity gap likely stems from FM embeddings’ semantic structure, which makes classifiers robust to small distortions. However, DP-CVAE’s higher fidelity suggests more faithful generation under DP, which is critical for more complex or sensitive downstream tasks.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    While the authors have adequately addressed many concerns, demonstrating that DP-CVAE effectively reduces communication overhead and achieves satisfactory BACC performance against standard federated learning baselines, the manuscript still lacks important comparative evaluations. In particular, comparisons against closely related latent-space generative methods (e.g., FedSR) remain notably absent, limiting the ability to conclusively assess the novelty and relative strengths of the proposed approach. Moreover, a brief comparison with a GAN-based approach would help substantiate the claim regarding GANs being overly resource-intensive. These remaining gaps slightly undermine an otherwise strong submission.



back to top