Abstract

Multimodal AI has demonstrated superior performance over unimodal approaches by leveraging diverse data sources for more comprehensive analysis. However, applying this effectiveness in healthcare is challenging due to the limited availability of public datasets. Federated learning presents an exciting solution, allowing the use of extensive databases from hospitals and health centers without centralizing sensitive data, thus maintaining privacy and security. Yet, research in multimodal federated learning, particularly in scenarios with missing modalities—a common issue in healthcare datasets—remains scarce, highlighting a critical area for future exploration. Toward this, we propose a novel method for multimodal federated learning with missing modalities. Our contribution lies in a novel cross-modal data augmentation by retrieval, leveraging the small publicly available dataset to fill the missing modalities in the clients. Our method learns the parameters in a federated manner, ensuring privacy protection and improving performance in multiple challenging multimodal benchmarks in the medical domain, surpassing several competitive baselines.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/0391_paper.pdf

SharedIt Link: https://rdcu.be/dV53O

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72117-5_10

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/0391_supp.pdf

Link to the Code Repository

https://github.com/bhattarailab/CAR-MFL

Link to the Dataset(s)

https://physionet.org/content/mimic-cxr-jpg/2.0.0/ https://stanfordmlgroup.github.io/competitions/chexpert/?utm_source=catalyzex.com https://www.kaggle.com/datasets/raddar/chest-xrays-indiana-university



BibTex

@InProceedings{Pou_CARMFL_MICCAI2024,
        author = { Poudel, Pranav and Shrestha, Prashant and Amgain, Sanskar and Shrestha, Yash Raj and Gyawali, Prashnna and Bhattarai, Binod},
        title = { { CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15010},
        month = {October},
        page = {102 -- 112}
}


Reviews

Review #1

  • Please describe the contribution of the paper
    1. This paper proposes a retrieval-based way to address the missing modality problem in multimodal federated learning setting, with a publicly available dataset.
    2. The proposed method achieves a more promising result than prior works.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-written, easy to understand and follow.
    2. Good performance
    3. Multimodal federated learning with missing modalities is a very practical and meaningful topic.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The authors claim that CAR-MFL leverages a small, publicly available multimodal dataset during training. However, in experiments, a common public dataset this paper uses comprises image-text pairs of 1000 patients, which is a not small one.
    2. In each epoch, every sample needs to retrieve top-k closest images from the public dataset. This process is very time-consuming and obviously a huge disaster for federated learning.
    3. Modalities of different patients are not aligned and have a huge discrepancy. It is not sure whether the retrieved samples really introduce useful information. The performance gain may come from the filling operation to decrease the distribution gap.
    4. I am curious about the performance of random filling using public datasets.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The source code and patient list of the public dataset is respected to be released

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please see the weaknesses of the paper

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    More ablation study is needed for verifying the effectiveness of the proposed method.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    Some previous issues are still not well addressed in the rebuttal, especially the computational overhead and reliability of retrieval-based data augmentation in multimodal federated learning with missing modalities. Nonetheless, this paper deserves consideration for acceptance as it provides a new perspective for solving the missing modality problem in multimodal federated learning.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a novel method for handling missing modalities in multi-modal federated learning for Chest-X-Rays. Currently, papers in this area are limited. The paper proposes to tackle the problem using a cross-modal data augmentation method using a publicly available dataset. The method retrieves the most likely absent modality sample from the publicly available dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The area of missing modalities in multi-modal Federated Learning is an interesting and promising scenario. Only a few approach exists in this area.
    • The paper demonstrates a well-structured format, making it easily comprehensible.
    • The proposed cross-modal data augmentation presents a novel approach within multi-modal Federated Learning. It’s simple and allows for seamless integration into existing FL frameworks, without complex additional components or the introduction of additional network architectures.
    • The experimental evaluation encompasses various settings and comparisons to baselines. The results show that the proposed method surpasses existing state-of-the-art approaches.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • For me, the main weakness is the dependency on publicly available datasets: While acknowledging the existence of multiple annotated publicly available datasets, it’s essential to note that relying solely on the availability of public data brings limitations on the applicability of the approach. For instance, the requirement for public datasets to align with all clinic-specific classes, diseases, and annotations restricts the broader application of the proposed method.
    • Emphasis on homogeneous settings in experiments: Although the experiments primarily concentrate on homogeneous settings, it’s noteworthy that the heterogeneous setting represent real-world scenarios more accurately. Extending the evaluation by focusing on the heterogeneous setting could have enhanced the experimental evaluation.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper addresses a highly relevant area, and the proposed method offers simplicity and easy integration into the Federated Learning. The storage location of the publicly available dataset is not clear from the text. From what I understand, the public dataset is distributed to each client before the federated learning begins. However, this might initially confuse readers, thinking that the publicly available dataset is stored on the server only.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper is well-written, and the proposed method for addressing missing modalities in multi-modal federated learning is novel and interesting. However, a significant weakness lies in the reliance on publicly available labeled datasets. This dependence could limit the broad applicability of the approach, as not all diseases may have publicly available datasets.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • [Post rebuttal] Please justify your decision

    I want to thank the authors for their rebuttal. After the rebuttal I still think that the proposed method for addressing missing modalities in multi-modal federated learning is novel and interesting. The proposed method’s simplicity and easy integration is very relevant. However, the weakness remains that publicly available data is needed. The paper is definitely one of the first ones in this relevant area but I wouldn’t claim it is the first of its kind in the rebuttal (e.g. Bao et al. - Multimodal Federated Learning with Missing Modality via Prototype Mask and Contrast; Saha et al. - Examining Modality Incongruity in Multimodal Federated Learning for Medical Vision and Language-based Disease Detection; and some others are cited in the related work section of the two papers above)



Review #3

  • Please describe the contribution of the paper

    the paper proposed a method to perform multi modal federated learning with missing modalities.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. the proposed methods show encouraging performance compared with baseline methods;
    2. the presentation is clear and easy to understand;
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. it seems that the task it self may not be suitiable for evaluation purpose. According to table 1, text may dominate the performance and leads to better results.
    2. there is no explanatin on the similarity between the public data and the training/test data. Moreove, the public data should be built in different ways to rerify the proposed method. At last, I notice that the author verifies the performance on rare diseases. I wonder whether these two diseases exist in public data?
    3. There is no stuidies on the performance of test samples which are largely different from public data.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    see weekness.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    the task is intesreting and practical. However, the method is rather simple and straightforward. There seems to have two designs as the label refinement and weight re-adjustment. whose effectivenesses are not validated in experiemtns. The experiemnts of the paper is not comprehensive and may not serve as a suitable benchmark for this task.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We thank the reviewers for their time and constructive feedback. Reviewers acknowledge our paper as novel, interesting, promising, simple, seamless (R6), improved performance (R6, R7, R8), practical and meaningful (R8), and easy to understand (R6, R7, R8).

Clarification in Setup[R6, R7, R8]: We clarify that in our setup, public data is open-source and can be downloaded to both clients and servers. We will also release the code and setup upon acceptance.

Heterogeneous setting[R6]: Please see our study on heterogeneous settings in Tab.1 and in Fig. 2 in Supp. involving public, uni- & multi-modal clients with data from three different institutions (MIMIC-CXR, CheXpert, Open-I), reflecting a real-world scenario.

Difference in test and public datasets and multimodal system [R7]: Although there is no study on test samples largely different (e.g., domain adaptation) from public data, which is beyond the scope of this study, the standalone performance in public data (Tab.1 in Supp) is relatively low. The 3.6% performance gain in heterogeneous settings is due to client data differing from the public/test set. Additionally, while text is more dominant than image, multimodal data provides complementary information, and without loss of generality, the contributions from each modality may vary. Thus, we assert that our method has comprehensive evaluation criteria.

Reliance on Public Data and its size[R6, R8]: We agree that we rely on a small number of public data. Our experiments on varying public sets (from 1000 to 250 patients~2.9% of the total patients) in both hetero- and homo-geneous setups (Fig 3(a) and Fig 2 in Supp) show the competitive performance. The trend does not decline sharply, and there is a wider gap in heterogeneous settings. Thus, we expect to achieve good performance even with fewer patients. As R6 rightly pointed out, our work is the first of it’s kind and exploring alternatives to public data is our future work.

Rare diseases in Public data [R7]: Yes, it is available but in limited quantities for augmentation. Achieving the same level of performance requires fewer multimodal data points compared to unimodal data, as multimodal data provide information from various sources. Our main goal was to investigate: since not all clients can collect multimodal data for rare diseases, how sharply does model performance degrade as the number of multimodal data points decreases in a system? And the role of cross-modal augmentation in mitigating the decline.

Effectiveness of Designs [R7]: While excluding label refinement in our method, we did not observe the performance gain consistently and also introduced unwanted computational overhead. In contrast, label refinement confined the search space to find the candidate for augmentation and demonstrated significant improvements without unnecessary computational overhead, leading us to adopt it as the default setting in all subsequent experiments. We also conducted a study on weight re-adjustment, with the performance results summarized in Tab. 4 in Supp.

Retrieved samples usefulness and random filling [R8]: Thank you for suggesting random filling using a public dataset. However, our experiments show that as the public data size increases, performance steadily improves. Our qualitative comparison (Figs 3 and 4 in Supp) also shows that augmented pairs are semantically more coherent and meaningful with the increase in the number of communication rounds. We do not expect random filling to ensure such coherency and thus improve performance.

Retrieving is time-consuming [R8]: We agree with this point. However, our search space is limited due to the small size of the public data and the label refinement process. Additionally, we retrieve data at the start of each communication round, not every epoch. Precisely, we retrieved data 30 times during our training, which consists of 90 epochs. We believe the privacy protection provided by our method outweighs the computational overhead.




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    An interesting paper addressing missing modality issues in federated learning. All the reviewers support it.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    An interesting paper addressing missing modality issues in federated learning. All the reviewers support it.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top