Abstract

Federated learning has recently developed into a pivotal distributed learning paradigm, wherein a server aggregates numerous client-trained models into a global model without accessing any client data directly. It is acknowledged that the impact of statistical heterogeneity in client local data on the pace of global model convergence, but it is often underestimated that this heterogeneity also engenders a biased global model with notable variance in accuracy across clients. Contextually, the prevalent solutions entail modifying the optimization objective. However, these solutions often overlook implicit relationships, such as the pairwise distances of site data distributions, which makes pairwise exclusive or synergistic optimization among client models. Such optimization conflicts compromise the efficacy of earlier methods, leading to performance imbalance or even negative transfer. To tackle this issue, we propose a novel aggregation strategy called Collaboration Graph-based Reinforcement Learning (FedGraphRL). By deploying a reinforcement learning (RL) agent equipped with a multi-layer adaptive graph convolutional network (AGCN) on the server-side, we can learn a collaboration graph from client state vectors, revealing the collaborative relationships among clients during optimization. Guided by an introduced reward that balances fairness and performance, the agent allocates aggregation weights, thereby promoting automated decision-making and improvements in fairness. The experimental results on two real-world multi-center medical datasets suggest the effectiveness and superiority of the proposed FedGraphRL.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0845_paper.pdf

SharedIt Link: https://rdcu.be/dV54l

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72117-5_25

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0845_supp.pdf

Link to the Code Repository

N/A

Link to the Dataset(s)

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T https://www.kaggle.com/competitions/aptos2019-blindness-detection https://github.com/deepdrdoc/DeepDRiD https://csyizhou.github.io/FGADR/ https://www.adcis.net/en/third-party/e-ophtha/ https://ieee-dataport.org/open-access/indian-diabetic-retinopathy-image-dataset-idrid https://www.adcis.net/en/third-party/messidor2/

BibTex

@InProceedings{Xia_Enhancing_MICCAI2024,
        author = { Xia, Yuexuan and Ma, Benteng and Dou, Qi and Xia, Yong},
        title = { { Enhancing Federated Learning Performance Fairness via Collaboration Graph-based Reinforcement Learning } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15010},
        month = {October},
        page = {263 -- 272}
}

Reviews

Review #1

Please describe the contribution of the paper

This work proposes employing reinforcement learning to enhance the aggregation process in federated learning by leveraging the pairwise relationships among clients to determine the optimal weight strategy on the server side. This approach prioritizes fairness enhancement, potentially encouraging greater collaboration among clients and federated learning research.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Congratulations for your submission, following you can find the sthrenghts of your paper:
1. The paper is well-structured, and the methodology is well-detailed.
2. Using reinforcement learning to automatically define the weights for aggregation is interesting since it can enhance fairness without relying on prior knowledge of data distribution, which is often unavailable in distributed learning environments.
3. Figures and tables are very descriptive, and an extensive comparison of the proposed approach with SOTA methods was provided.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
Following you can find areas for improvement in your submission:
1. The benefits in terms of health equity need to be clarified. The authors could have discussed them more deeply.
2. The authors could use other established fairness metrics, such as equalized odds and subgroup performance disparities.
3. It would be interesting to compare the training time of the proposed approach and SOTA.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

The methods are descriptive enough to reproduce this work.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

This work is very interesting from a technical perspective. However, a deeper understanding of the impact on health equity is necessary. How can this approach engage collaborators? What are the implications for them? From a healthcare perspective, what does fairness mean? How does the standard deviation metric reflect this fairness? The dataset used in this work comprises sites with large local datasets. Can this approach be implemented in a more challenging environment where sites provide few datasets? What are the computational resources required for its implementation? These are just a couple of questions that are important to consider when designing solutions for health equity.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The major reason for my decision is the novelty of the proposed approach for MIC. Still, this paper is not interesting for the health equity stream since it lacks an understanding and comprehensive discussion.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

weight aggregation strategy is devised based on reinforcement learning based graph representation of the colabs. multi-layer adaptive graph convolutional network is made at the server. The key contribution lies in its ability to dynamically learn a collaboration graph from client state vectors, which identifies and leverages the collaborative relationships among clients during the optimization process. performance is shown on 2 medical datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

aggregation strategy has good performance and potential for adaptability for clients data difference, validation performance is shown on 2 medical datasets.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Although the paper demonstrates effectiveness on two medical datasets, the scalability of the FedGraphRL approach to larger networks with many more clients remains uncertain. The increased computational load and the complexity of managing a larger collaboration graph could pose challenges. Moreover, performance of reinforcement learning-based methods can be highly sensitive to the choice of hyperparameters hence generalizability can be difficult to achieve.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

Clarify Implementation Details, Conduct scalability tests to evaluate how the FedGraphRL strategy performs as the number of clients increases. Explore and discuss the applicability of FedGraphRL to other domains beyond medical datasets. Provide a deeper analysis of how fairness is achieved and maintained in the FedGraphRL strategy. Discuss the practical deployment aspects of FedGraphRL, including any limitations and challenges that might be faced when implementing this strategy in real-world environments. Discuss the future directions of the study.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Weak Accept — could be accepted, dependent on rebuttal (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The experimental results presented in the paper, based on 2 medical datasets provide evidence of the effectiveness of the proposed method. However, paper can be organized better to articulate complex concepts clearly and effectively. I am alsointerested to read more details on scalability and discussion on the practical limitations of the approach.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The authors propose a method for enhancing performance fairness in federated Learning using a graph-based reinforcement learning.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The approach is compared to a large number of baseline FL algorithms on two realistic distributed medical imaging datasets. It shows promising performance. The paper is well structured and an interesting contribution to the field.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

It is a solid piece of work but some clarifications are needed as outlined in the constructive feedback.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Do you have any additional comments regarding the paper’s reproducibility?

The author promise to release the code
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

The approach seems related to prior works training to learn the aggregation weights in FL? e.g.,

Xia, Yingda, et al. “Auto-FedAvg: learnable federated averaging for multi-institutional medical image segmentation.” arXiv preprint arXiv:2104.10195 (2021).

Espcially, the following work utilizes reinforcement learning as well to arrive at the weights in a data-driven manner and should be cited and differences explained.

Guo, Pengfei, et al. “Auto-fedrl: Federated hyperparameter optimization for multi-institutional medical image segmentation.” European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022.

Algorithm 1, specify whether the inference loss is computed on training or validation sets.

The introduction could mention that this work is focused on cross-silo FL setting.

Table 1 should specify what metric is measured by (Avg) in the caption or table header. Also, include the reference paper for each baseline algorithm in the table.

If feasible, compare other tradeoffs such as computation time and communication freqency of each algorithm? (Table 1)

Section 3.3 Should add some high-level explaination on how [2] estimates the pairwise distances. The differences to [2] and [3] should be highlighted.

Fig.2. Why is the similarity not 1 on the diagonal?
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

Accept — should be accepted, independent of rebuttal (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Solid piece of work with interesting motivations and methodologies backed by experimentation, baseline comparison, and suitable ablation studies.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Author Feedback

We appreciate the suggestions from the reviewers on our work, and we will consider their suggestions to further improve the quality of this work in either the final version or future work. Reviewer #1  Q1: Clarify the distinctions between Auto-FedAvg and Auto-fedrl. A: Both methods input and update the hyperparameter distributions online, and the results sampled from these distributions are used as hyperparameters. In contrast, FedGraphRL uses the client state vectors of each communication round as input to update the aggregation weights offline, thus improving the efficiency of exploration and exploitation. Q2: Writing of the article. A: Inference loss is measured on the validation set. We will make every effort to improve the paper and enhance its readability. Q3: Compare computation time and communication frequency of each algorithm. A: We plan to supplement our journal version with a comprehensive analysis of FedGraphRL, including its training time and communication frequency. Q4: Detailed explanation of pairwise distances. A: The visualization of the relation graph results from the normalization of the learnable adjacency matrix in Eq. 5, hence the diagonal values are not 1. We will provide a more detailed explanation of the pairwise distances in the final version. Reviewer #3 Q1: Clarify benefits in terms of health equity. A: This study ensures that clients do not lag behind in the federated system, thereby maintaining fairness in federated learning performance. Evaluation metrics like equalized odds for group fairness are not applicable for assessing our method. Reviewer #4 Q1: Discuss the practical deployment limitations and its applicability to other domains. A: We plan to supplement our journal version with a more comprehensive analysis of FedGraphRL and extend its application to more clients and datasets to further establish its utility.

Meta-Review

Meta-review not available, early accepted paper.

back to top

Enhancing Federated Learning Performance Fairness via Collaboration Graph-based Reinforcement Learning

Author(s):