Abstract

Medical image segmentation is crucial for clinical diagnosis. The Segmentation Anything Model (SAM) serves as a powerful foundation model for visual segmentation and can be adapted for medical image segmentation. However, medical imaging data typically contain privacy-sensitive information, making it challenging to train foundation models with centralized storage and sharing. To date, there are few foundation models tailored for medical image deployment within the federated learning framework, and the segmentation performance, as well as the efficiency of communication and training, remain unexplored. In response to these issues, we developed Federated Foundation models for Medical image Segmentation (FedFMS), which includes the Federated SAM (FedSAM) and a communication and training-efficient Federated SAM with Medical SAM Adapter (FedMSA). Comprehensive experiments on diverse datasets are conducted to investigate the performance disparities between centralized training and federated learning across various configurations of FedFMS. The experiments revealed that FedFMS could achieve performance comparable to models trained via centralized training methods while maintaining privacy. Furthermore, FedMSA demonstrated the potential to enhance communication and training efficiency. Our model implementation codes are available at https://github.com/LIU-YUXI/FedFMS.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0366_paper.pdf

SharedIt Link: https://rdcu.be/dZxdz

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72111-3_27

Supplementary Material: N/A

Link to the Code Repository

[https://github.com/LIU-YUXI/FedFMS](https://github.com/LIU-YUXI/FedFMS] https://github.com/LMIAPC/FednnU-Net

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Liu_FedFMS_MICCAI2024,
        author = { Liu, Yuxi and Luo, Guibo and Zhu, Yuesheng},
        title = { { FedFMS: Exploring Federated Foundation Models for Medical Image Segmentation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15008},
        month = {October},
        page = {283 -- 293}
}


Reviews

Review #1

  • Please describe the contribution of the paper
    • Applied SAM foundation model into federated learning. Add adaptors to the SAM decoder for fine-tuning client models
    • Constructed a big dataset based on several public datasets.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Claim to be the first work using foundation model in federated learning.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Limited novelty: Applying SAM to existing federated frameworks.
    • Only compared to FedUnet, could add more comparison to SOTA methods
    • All clients’ data are from the same data sources/distribution. It is not a real federated learning setting.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?
    • Both code and datasets are publically available
    • Environment setting is also available
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Different clients should use different sources of data.
    • Add more background introduction of federated learning
    • No statical tests to support result analysis.
    • Include other Federated SOTA methods for comparison, not necessarily foundation models. This will further confirm the benefits of using foundation models
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Different clients should use different sources of data. Currently they share the same dataset and distribution. This is not a real federated learning testing environment.
    • FedU-Net is quite a basic baseline method to compare with.
    • No statistical tests to draw a thorough conclusion.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The primary contribution of this study is the development of the Federated Foundation Models for Medical image Segmentation (FedFMS), which includes two models: the Federated SAM (FedSAM) and a more communication and training-efficient version, the Federated SAM with Medical SAM Adapter (FedMSA). These models are designed to handle the challenges of medical image segmentation within a federated learning framework, allowing training on distributed datasets without centralizing data storage. This approach addresses critical issues such as the segmentation performance and the efficiency of communication and training in distributed environments.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    FedSAM with all fine-tunned parameters, outperforms FedMSA in prostate, nuclei, and fundus segmentation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The entire paper is built upon a non-peer-reviewed paper published on arXiv since 2016. As the framework is not validated in medical image segmentation, validation, and more comparisons are very necessary.
    2. Now the benchmark of the segmentation is no longer the one that was cited by authors [19], published in 2015, but nnU-net published by Nature in 2021. The version 2 nnU-Net was released recently, and it performed very well in all medical image segmentation tasks, regardless 2D or 3D. The training procedure is efficient too given large dataset. It is highly recommended that authors use nnU-Net as the benchmark.
    3. The beauty of SAM is its visual segmentation tasks with prompts and its adaptability and effectiveness in various segmentation applications. SAM operates under a centralized training regime, whereas the federated approach allows each client to train the model across multiple decentralized modes, each holding its local dataset. The proposed framework did not demonstrate this. Was the server (NVIDIA A800) for fine-tuning or aggregating parameters? Does the model efficiency analysis shown in Table 2 include client learning, aggregating, and fine-tuning? Please also list the average predicting time per 2D image.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Spell out abbreviations such as MAE and FLOP.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is lacking in novelty maybe due to unclear demonstration of the proposed model. The evaluation is insufficient.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    My most concerns were addressed in the rebuttal.



Review #3

  • Please describe the contribution of the paper

    The main contribution of the paper is to introduce the Federated Learning into Medical Image Segmentation built upon the seminal work of Segmentation Anything Model (SAM). Given the sensitive nature of medical records scattered in vast distribution of institutions, the motivation of such framework is well justified. Additional optimization is achieved by applying of Medical SAM Adaptor (MSA) into the federation of learning. Experiments on 4 distinct datasets with different modalities across 4 to 7 institutions validates the advantage of a pre-trained foundation model vs training from scratch, and the performance resulted from distributed learning is on par with centralized training.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is convincing that federated learning is suitable for medical image segmentations, varying from pathology to modalities by using the pre-trained SAM foundation model. The introduction of MSA adds further saving for distributed learning. The paper is well written and easy to follow. Experiments are thorough for answering the research questions.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    SAM is built upon vast datasets with variety of objects and their associated masks. However, the federated learning framework on top of SAM seems only limited to similar organs at the same modality. There is no evidence shown in the paper that the impact of pre-trained SAM has been fully explored across organs, or even modalities.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It seems there are sufficient code published in the link.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please do commenting properly in your code so that English readers can understand the comments as well.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I personally lack expertise in federated learning. My decision is based on my general knowledge on ML and image processing, over the readability of the paper itself.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank all reviewers for their thoughtful comments. R3: The federated SAM is limited to similar organs at the same modality. A: If each client has different organs and modalities, the training in federated learning will be difficult to converge, resulting in poor performance. So current federated learning methods are mainly applied in the same type of organs and modality. R3: Do commenting properly in code. A: We improved the comments in our code. R4: Different clients should use different sources of data. A: In our experiments, the datasets from each client actually originate from different institutes. E.g., in Fundus, the datasets of clients A, B, C, and D are sourced from REFUGE, ORIGA, G1020, and Drishti-GS1, respectively. We’ll replace A, B, … with the names of the institutes. R4: Add more background introduction of federated learning. A: Following your suggestions, we’ll add more introduction: Federated learning is widely applied in medical images since it allows collaborative training across healthcare institutions without sharing sensitive data, preserving privacy and complying with regulations. It leverages diverse, high-quality datasets from different sources, enhancing the model’s performance and generalizability while ensuring data security. R4: No statistical tests. A: Based on the existing results, we calculated the p-values between FedSAM and SAM, as well as between FedMSA and MSA, which are both greater than 0.5. This indicates that the differences are not significant, i.e., the results of federated training and centralized training are similar. R4&R5: Limited novelty. A: We aim to investigate if SAM performs as well in federated learning as in centralized training, so we utilize the fedAvg framework. Other federated learning algorithms are also applicable, such as FedProx and FedNova. Our innovations include: 1. collecting various real medical segmentation data from multiple institutions, whereas other federated learning studies typically use only two datasets, 2. implementing federated learning for foundation models and exploring its effectiveness, 3. using MSA to achieve a more efficient method. R4&R5: Include other SOTA methods for comparison, such as nnU-Net. A: According to our attempts, due to the ineffectiveness of nnU-Net’s preprocessing under federated learning and its instability when training on real multi-institution (Non-IID) datasets, the average dice of FednnU-net are lower than those of FedSAM by 27.96% for Prostate, 9.86% for Brain Tumour, 9.73% for Fundus, and 2.75% for Nuclei. FedSAM shows better generalizability and stability. R5: The paper is built on a non-peer-reviewed paper published on arXiv. A: FedAvg’s paper was published on arXiv and PMLR. We corrected the reference to: Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics. PMLR (2017). R5: The proposed framework did not demonstrate the beauty of SAM. A: Following your suggestions, we’ll add demonstrations of SAM’s adaptability and effectiveness under federated learning. R5: Was the NVIDIA A800 for fine-tuning or aggregation? What does the model efficiency analysis in Table 2 include? List the average predicting time per 2D image. A: In our framework, clients are used for fine-tuning, server is used for aggregation. They both run on the NVIDIA A800. In Table 2, GPU Memory Usage, Average Training Time, FLOPs and Learnable parameters are for client learning (i.e. fine tuning), Learnable Parameter is also used to calculate the parameters the aggregation needs to communicate and sum. Calculated from existing results, FedSAM and FedMSA require 0.118 s and 0.127 s respectively when predicting per image. R5: The paper does not provide sufficient information for reproducibility. A: The submission has provided an anonymized link to the source code, dataset and environment setting. R5: Spell out abbreviations. A: MAE is Masked Autoencoders. FLOP is Floating Point Operations.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    I think the rebuttal addresses most concerns the reviewers had and I believe this work touches on a useful topic of exploring working with FMs in a federated setting.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    I think the rebuttal addresses most concerns the reviewers had and I believe this work touches on a useful topic of exploring working with FMs in a federated setting.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Rebuttal has addressed most of the concerns/questions from reviewers thus I recommend accept.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    Rebuttal has addressed most of the concerns/questions from reviewers thus I recommend accept.



back to top