Abstract

Federated learning (FL) has emerged as a promising approach to medical image analysis that allows deep model training using decentralized data while ensuring data privacy. However, in the field of FL, communication cost plays a critical role in evaluating the performance of the model. Thus, transferring vision foundation models can be particularly challenging due to the significant resource costs involved. In this paper, we introduce a federated adaptive Contrastive Language Image Pretraining (\clip{}) model designed for classification tasks. We employ a light-weight and efficient feature attention module for \clip{} that selects suitable features for each client’s data. Additionally, we propose a domain adaptation technique to reduce differences in data distribution between clients. Experimental results on four publicly available datasets demonstrate the superior performance of FACMIC in dealing with real-world and multisource medical imaging data. Our codes are available at \url{https://github.com/AIPMLab/FACMIC}.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1577_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/1577_supp.pdf

Link to the Code Repository

https://github.com/AIPMLab/FACMIC

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Wu_FACMIC_MICCAI2024,
        author = { Wu, Yihang and Desrosiers, Christian and Chaddad, Ahmad},
        title = { { FACMIC: Federated Adaptative CLIP Model for Medical Image Classification } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a federated adaptive Contrastive Language Image Pretraining (CLIP) model designed for classification tasks to solve the data distribution shifts and communication costs problems in FL. Specifically, they employ a light-weight feature attention module for CLIP that selects suitable features for each client’s data. Additionally, they propose a domain adaptation technique to reduce differences in data distribution between clients.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper adopts a lightweight method to fine-tune the pretrained models in FL, which can to reduce the communication cost.
    2. The Local Maximum Mean Discrepancy (LMMD) used in this work can reduce the distribution shift problem.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The novelty of this paper is limited, as it simply combines existing widely used technologies, namely feature attention and the adaptation technique LMMD. Additionally, there is no innovation in terms of application in this work.
    2. Equation (1) is not suitable for representing the goal of achieving good generalization performance on the test data.
    3. This algorithm requires an auxiliary dataset with a feature distribution similar to all the clients. However, satisfying this assumption in real-world healthcare applications can be challenging.
    4. The experimental results are not convincing. Why with the data heterogeneity increases (i.e., \alpha from 0.9 to 0.3), the performance increase? Furthermore, why are the performance of other methods still not good in the iid setting?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The provided code cannot run due to a bug at utils/training.py, line 153: ‘IndentationError: expected an indented block after the ‘if’ statement on line 152’.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The novelty and writing of the paper still need improvement. Additionally, there are inconsistencies between some experimental observations and previous works (i.e. the performance changes with changes in data heterogeneity). It would be helpful to provide more analysis and explanations. Moreover, the provided code cannot run due to a bug at utils/training.py, line 153: ‘IndentationError: expected an indented block after the ‘if’ statement on line 152’.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See weekness

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • [Post rebuttal] Please justify your decision

    I have read the author’s response and the other reviews. However, some of my concerns are still unresolved, such as the novelty of this work, Equation (1), the limitation posed by the requirement of an auxiliary dataset, and the phenomena observed in Table 2. Therefore, I have decided to maintain my original score.



Review #2

  • Please describe the contribution of the paper

    This paper presented an adaptive federated CLIP model to address two problems:1. high communication cost of federated learning 2. domain shifting of data from different sites.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Clearly layout the motivations of this study. Sufficient ablation study to show their points. This study provides sufficient comparison with other SOTA methods (FedAVG, FedProx, MOON, FedFocal, FedCLIP)

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The manuscript results are hard to read. For example, I have to guess the meaning of ‘Average’ as a column. And Fig. 2 the color legend mismatches with the plots. I have to guess if the red (ours-w/o DA) corresponds to the orange line in the plot. Plus, some abbreviation without definition and typos (Dillicree or Dillikere).
    2. The evidence of that adding DA loss function can improve model performance is questionable. a. To my understanding, FACMIC and FedCLIP have the same/similar attention module design. But why under the iid or average conditions in Table 2, FACMIC can have significant improvement for BT? b. Why in table 2, under Real condition, FACMIC can have significant improvement compare to its performance on other conditions of SC dataset, while other SOTA methods keep similar level of performance across different conditions of SC dataset. c. In table 3, without the DA Loss, the model performances is pretty consistent under different alpha value and idd condition. Does this say attention module is robust to the domain shift across the client? Then why DA loss can contribute to better performance? d. In Table 5, why FedCLIP can beat other SOTA methods using BT2 dataset but to in Fig 2 and Table 2? Whats the difference between ‘Ours-w/o DA’ and FedCLIP?
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Please provide more detailed information in the caption or in the context to explain the results.
    2. Define all the abbreviations and check the typos.
    3. Clear discussion should be added to explain the results. (please refer to my comments to the main weakness )
    4. Can the author explain the difference between adding attention based adapter and finetune the last MLP layer of imaging encoder? Why not directly updating the the last layers of imaging encoder in similar style?
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is hard to read and the results are questionable. So more explanation and discussion should be added.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper propsed federated learning for medical image classification for data privacy. Within each client, CLIP framework was used as the training model with loss a combination of contrastive loss and domain adaptation loss. Parameter within each client was aggregated and updated to the global server. The performance on brain tumor and skin cancer images have demonstrated state-of-the-art results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Strengths of this paper are outlined in the algorithm design, including implementation details like the data division. The design of domain adaptation loss was novel and has contributed to the performance boost compared to state-of-the-art algorithms.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Weaknesses of the paper are also in implementation details and minor, where data division can be further elaborated for readers to understand.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This paper propsed federated learning for medical image classification for data privacy. Within each client, CLIP framework was used as the training model with loss a combination of contrastive loss and domain adaptation loss. Parameter within each client was aggregated and updated to the global server. The performance on brain tumor and skin cancer images have demonstrated state-of-the-art results.

    Strengths of this paper are outlined in the algorithm design, including implementation details like the data division. The design of domain adaptation loss was novel and has contributed to the performance boost compared to state-of-the-art algorithms.

    Weaknesses of the paper are also in implementation details and minor, where data division can be further elaborated for readers to understand.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty of this paper mainly resides in the algorithm and loss function design, where contrastive loss was combined with domain adaptation loss. Advantage of modeling data with the domain adaptation loss was demonstrated effecive in the accuracy gain.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The feedback has been addressed in an organized manner.




Author Feedback

We thank reviewers for their careful review and constructive feedback.

R1.1) Novelty of method.

Although past studies have applied federated learning (FL) in medical imaging, to our knowledge, we are the first to explore FL on VLMs like CLIP in this context. Existing FL solutions, which naively transfer all model parameters to the global server are not applicable to VLMs due to their size (e.g., 10^8 parameters for CLIP). Inspired by FedCLIP, our method only shares the parameters of a local attention module (AM) learned independently for each client. However, our work improves FedCLIP in two important ways. 1) While it uses a shallow network (2 layers) for its AM, we propose a more complex architecture better capturing the data variability across clients. 2) Unlike FedCLIP, we address domain shifts directly by incorporating a domain adaptation (DA) loss in our model. As shown in results, our method significantly outperforms FedCLIP in all test cases.

R1.2) Generalization performance.

Our formulation follows Eq. (1) of [12]. However, we agree that the model’s generalization performance should be evaluated on clients that did not participate in training. We will clarify this in the updated version.

R1.3) Need for auxiliary dataset.

This indeed adds a constraint, yet we point out that this auxiliary dataset does not require labeled samples. As there are many publicly available datasets in medical imaging (e.g., UK Biobank, HCP for brain images), we believe that this constraint is less important than the advantages brought by our method’s DA capabilities.

R1.4) Impact of heterogeneity.

When aggregating parameters in Eq. (7), we give a larger weight to clients with more training samples. In the non-iid setting, this enables the global model to learn from the most knowledgeable clients, preventing performance degradation. In the iid setting, the lower performance of other methods could be due to the need of fine-tuning the entire image encoder or to the shallower AM.

R1.5) Error running the code.

We fixed the mentioned bug (missing space) in the code and provided a detailed tutorial on how to run it.

R3.1) Implementation details.

We will add the suggested information in the final version.

R4.1) Readability of results.

“Average” is the mean for both non-iid and iid conditions. The red line (ours-w/o DA) matches the orange line in the plot. We will clarify these in the updated version.

R4.2) AM design.

FACMIC employs a different architecture for its AM (more layers, different activation, etc.), which boosts its performance in the iid setting. Since CLIP was mainly trained on natural images, this stronger module is needed to adapt the image features to medical imaging tasks.

R4.3) Improvements of FACMIC under Real condition.

For the Real dataset, each client’s data is from a difference source (SC, HAM10000, or ISIC2019) whereas, for BT and SC, all clients have samples from the same source but the number of samples for each class differs. As our DA strategy is applied on image features it is more useful for the domain shifts encountered in the Real dataset.

R4.4) Performance for different alpha.

As mentioned above, the domain shift for BT and SC affect the distribution of labels on which our DA technique has less impact. Echoing our answer to R1.4, our aggregation strategy giving a larger weight to clients with more samples already addresses this type of shift.

R4.5) Difference between ‘Ours-w/o DA’ and FedCLIP.

Our model employs a different architecture for its AM, which is more suitable for adapting CLIP features to medical imaging (see our answer to R1.1).

R4.6) Adapting the last layer of image encoder.

This MLP layer has many more parameters than our AM (10^7 params vs. only 5x10^5) hence adapting/sharing it incurs greater computation/communication costs. We also tried updating the last layers of the image encoder for FedAVG in BT (alpha=0.9) and obtained a lower accuracy (72.34% vs 81.73%).




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    While the approach is innovative and demonstrates performance improvements over state-of-the-art methods, the novelty is limited as it primarily combines existing technologies without significant new contributions. However, I like the work introduces a novel federated learning (FL) model that combines a lightweight feature attention module and domain adaptation technique, effectively reducing communication costs and addressing data distribution shifts.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    While the approach is innovative and demonstrates performance improvements over state-of-the-art methods, the novelty is limited as it primarily combines existing technologies without significant new contributions. However, I like the work introduces a novel federated learning (FL) model that combines a lightweight feature attention module and domain adaptation technique, effectively reducing communication costs and addressing data distribution shifts.



back to top