Abstract

Federated learning (FL) has shown great potential in medical image computing since it provides a decentralized learning paradigm that allows multiple clients to train a model collaboratively without privacy leakage. However, current studies have shown that heterogeneous data of clients causes biased classifiers of local models during training, leading to the performance degradation of a federation system. In experiments, we surprisingly found that continuously freezing local classifiers can significantly improve the performance of the baseline FL method (FedAvg) for heterogeneous data. This observation motivates us to pre-construct a high-quality initial classifier for local models and freeze it during local training to avoid classifier biases. With this insight, we propose a novel approach named Federated Classifier deBiasing (FedCB) to solve the classifier biases problem in heterogeneous federated learning. The core idea behind FedCB is to exploit linguistic knowledge from pre-trained language models (PLMs) to construct high-quality local classifiers. Specifically, FedCB first collects the class concepts from clients and then uses a set of prompts to contextualize them, yielding language descriptions of these concepts. These descriptions are fed into a pre-trained language model to obtain their text embeddings. The generated embeddings are sent to clients to estimate the distribution of each category in the semantic space. Regarding these distributions as the local classifiers, we perform the alignment between the image representations and the corresponding semantic distribution by minimizing an upper bound of the expected cross-entropy loss. Extensive experiments on public datasets demonstrate the superior performance of FedCB compared to state-of-the-art methods. The source code is available at https://github.com/CUHK-AIM-Group/FedCB.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/3064_paper.pdf

SharedIt Link: https://rdcu.be/dV55u

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72117-5_64

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/3064_supp.pdf

Link to the Code Repository

https://github.com/CUHK-AIM-Group/FedCB

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Zhu_Stealing_MICCAI2024,
        author = { Zhu, Meilu and Yang, Qiushi and Gao, Zhifan and Liu, Jun and Yuan, Yixuan},
        title = { { Stealing Knowledge from Pre-trained Language Models for Federated Classifier Debiasing } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15010},
        month = {October},
        page = {685 -- 695}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes a method to fight class imbalance for classification task in the federated learning setup, where data distributions can vary dramatically between clients. In that purpose, the authors propose to build local classifiers based on a contrastive learning loss, between 1) features extracted from local images, and 2) features sampled from Gaussian distributions shared across clients and built by using text encodings.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper tackles an interesting problem
- Although not novel, the final loss is well motivated by the problem formulation
- Extensive evaluation with many baselines and ablation studies
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The paper is hard to follow, with many missing details or lack of contextualisation
- The contributions are not clear, since the state-of-the-art is not well explained
- The use of text embeddings is never really motivated
- Many inaccuracies, typos, altered template
- No plan to release the code
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
Major:
- The contributions of this paper are very unclear, because there is no proper literature review is conducted. The proposed method is tested against many baselines, but none of them are introduced, so we don’t know how this paper compares to the state-of-the-art. Did other method tackled class imbalance for federated classification tasks ? Did other method use text-embeddings in federated learning ? For example, the proposed loss is presented a a main contribution, while it’s in fact already derived in [19].
- The use of text embeddings is never really motivated, and the proposed method seem very convoluted. For example, why not just use the image feature extractor of BiomedCLIP and train local classifiers on that ?
- I don’t think the experiments are representative of an actual federated learning scenario, since here “clients” are simulated by subdividing the same dataset. I think realistic federated learning should simulate domain shifts between clients.
- The organisation of the paper is very confusing. For example, the authors conduct an experiment in the introduction, which could be an interesting approach, but here a lot of information is missing such as the task, the datasets, the architecture of the models, etc. Another example is that the paper never explicitly says it’s tackling a classification task. A last example is that the last table mentions a “baseline”, but never says which one it is! Overall, the readers are left to a cumbersome guessing work.
- It’s not clear how classes are predicted based on the outputs of the F operator in (5).
- There are way too many typos and inaccuracies (see minor).
Minor:
- The provided template has been extensively altered to make the paper fit in 8 pages. This is not fair to other submissions, and makes the paper hard to read. I believe there are many opportunities to shorten some sections, for example, the authors could remove Fig 1, which doesn’t add any value, as it simply shows that data distributions can vary between clients.
- There’s a mistake on the third line of the proof in Appendix A, the last exponential should vanish
- “We add a MLP layer”: this is wrong, please replace MLP by “fully connected layer”
- I don’t know if “stealing knowledge of pre-trained language models” is the best way to frame this paper, especially with the current legal/ethical concerns about large language models.
- “The rapid advance of deep learning in medical image analysis is mainly attributed to the availability of large-scale medical image datasets.” This is hugely debatable, most of the field is focusing on learning from small datasets, whose size are limited due to privacy issue, cost, or unavailability of supervision. Please rephrase.
- Acronyms should be introduced: “Non-IID” (p.3)
- I think there’s a problem with Fig 2, since the means and covariances are learnt on the server side, not the clients.
- Why do the M text embeddings keep appearing in (1) if they were averaged as mentioned in the previous paragraph ?
- Typos: “best viewED” (caption of Fig 1), “how to construct a classifier becomeS a vital step” (p.2), “LET’S CONSIDER a typical federated learning scenario” (p.3), “to initialize the the parameters” (double the), replace “lcoal” by “local” in the conclusion
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Reject — could be rejected, dependent on rebuttal (3)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper is borderline between 2 and 3. I am ready to give it a chance to better explain the contributions and improve on the clarity of the explanations, but I’m afraid this will require too much re-writing to be acceptable in a rebuttal.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The authors propose to solve the classifier biases problem in heterogeneous federated learning. They specifically borrowed linguistic knowledge from pre-trained language model to facilitate the construction of good local classifiers.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well written and easy to read. The model has certain novelty. Better performance obtained as compared to some recent work.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

There is a lack of sufficient description to illustrate the distinctions in framework architecture when compared to FedETF [12].
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

There is a lack of sufficient description to illustrate the distinctions in framework architecture when compared to FedETF [12].
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This work holds certain significance and novelty. Introducing language to vision into the federate learning is interesting.
Reviewer confidence

Somewhat confident (2)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

This manuscript introduces an innovative approach that leverages text embeddings, extracted via pre-trained language models, to train local classifiers within a federated learning framework. This method is specifically tailored for non-i.i.d. scenarios, which are commonly encountered in medical imaging datasets. The authors have tested their approach on two distinct datasets: retinal OCT and gastrointestinal tract endoscopic images, demonstrating notable improvements across various non-i.i.d. conditions.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Strengths:
- The introduction of pre-trained language models to address data heterogeneity is novel
- The motivation of the work is justified with a preliminary experiment that aptly demonstrates the potential of the concept of freezing local classifiers.
- The study includes a comprehensive comparison with multiple federated learning methods, reporting superior performance on two different medical imaging datasets.
- The derivation of the method is logically sound and articulated clearly. Additionally, the authors provide valuable ablations on performance variations influenced by the number of prompts and explore different methodological variations
- The article is well written and easy to follow
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
Comments and Reccomandations:
- The manuscript would benefit from a discussion on how this approach compares with prototype learning in federated settings, as explored in studies like FedProto [1]. Although the integration of pre-trained language models distinguishes your approach, understanding how it differs from or complements similar strategies could clarify its unique contributions and further justify its novelty.
- The paper should consider including a discussion on other domain generalization methods such as FedDG [2], noted for their efficacy in medical imaging. Addressing why these were not included would help situate your work more comprehensively within the existing framework of federated learning research.
- It could be advisable to avoid hyperbolic terms like “overwhelming” and “remarkable.”
- Further exploration of the representations learned by each client, particularly focusing on the specificity and applicability of the text embeddings, would deepen the understanding of the method and its limitations. While a comprehensive analysis of potential limitations or failure cases might extend beyond the scope of this conference submission, highlighting these aspects could provide valuable insights and direct future research efforts effectively.
References:

[1] Tan Y, Long G, Liu L, Zhou T, Lu Q, Jiang J, Zhang C. Fedproto: Federated prototype learning across heterogeneous clients. InProceedings of the AAAI Conference on Artificial Intelligence 2022 Jun 28 (Vol. 36, No. 8, pp. 8432-8440)

[2] Liu Q, Chen C, Qin J, Dou Q, Heng PA. Feddg: Federated domain generalization on medical image segmentation via episodic learning in continuous frequency space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 (pp. 1013-1023).
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

See above
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper introduces an innovative approach, utilizing pre-trained language models to address data heterogeneity in federated learning, specifically for non-i.i.d. medical imaging data. Additionally, the paper is well-articulated, offering a clear, comprehensive analysis and comparison to current methods.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Author Feedback

We sincerely thank all reviewers for their constructive reviews. We have carefully studied the comments and will revise the manuscript as suggested.

(R1, R3, R5) Common concerns will be revised in final camera-ready files, including typos and grammar errors, inaccurate expressions.

R1 Q1: The contributions of this paper are very unclear…, so we don’t know how this paper compares to the state-of-the-art… A1: One of the main contributions of this paper is addressing classifier bias problem cased by data heterogeneity from a new perspective, i.e., borrowing linguistic knowledge from pre-trained language models (PLMs) to pre-construct a high-quality classifier. In experiments, we have compared SOTA methods that also focus on classifier bias and data heterogeneity problems, such as FedPROX[10], FedROD[3], FedETF[12], and so on. We will highlight contributions in the final version.

Q2: …why not just use the image feature extractor of BiomedCLIP and train local classifiers on that? A2: The image feature extractor of BiomedCLIP has a large model size. Direct training will incur heavy communication costs and face classifier bias and data heterogeneity problems.

Q3: I don’t think the experiments are representative of an actual federated learning scenario, since here “clients” are simulated by subdividing the same dataset… A3: Data heterogeneity can be divided into distribution skew and label skew. This paper mainly focuses on label skew. The experiment settings follow the previous methods, FedPROX[10], FedROD[3], and FedETF[12].

Q4: …but here a lot of information is missing such as the task, the datasets, the architecture of the models, etc. … never explicitly says it’s tackling a classification task…the last table mentions a “baseline”, but never says which one it is… A4: Sorry for the confusion. The pilot experiment is conducted on an OCT-C8 dataset [18]. Experiment details are shown in “Implementation Details”. This paper mainly focuses on medical classification tasks. ‘baseline’ defaults to FedAvg in this paper. We will revise these problems in the final version.

R3 Q1: There is a lack of sufficient description to illustrate the distinctions in framework architecture when compared to FedETF [12]. A1: FedETF [12] utilizes orthogonal initialization to construct the classifier, which lacks semantic interpretability. The classifiers of different classes are not necessarily strictly orthogonal. In this paper, the classifier constructed by pre-trained language models contains rich semantics and distance relationships and is domain-agnostic.

R5 Q1: The manuscript would benefit from a discussion on how this approach compares with prototype learning in federated settings, as explored in studies like FedProto [1]… A1: FedProto[1] collects and aggregates prototypes from clients to deal with data heterogeneity. However, the prototypes depend on local feature extractors and thus may be heterogeneous, affecting the learning of local feature extractors in turn. Our method directly uses pre-trained language models to construct a high-quality classifier.

Meta-Review

Meta-review not available, early accepted paper.

back to top

Stealing Knowledge from Pre-trained Language Models for Federated Classifier Debiasing

Author(s):