Abstract

In the fight against the COVID-19 pandemic, leveraging artificial intelligence to predict disease outcomes from chest radiographic images represents a significant scientific aim. The challenge, however, lies in the scarcity of large, labeled datasets with compatible tasks for training deep learning models without leading to overfitting. Addressing this issue, we introduce a novel multi-dataset multi-task training framework that predicts COVID-19 prognostic outcomes from chest X-rays (CXR) by integrating correlated datasets from disparate sources, distant from conventional multi-task learning approaches, which rely on datasets with multiple and correlated labeling schemes. Our framework hypothesizes that assessing severity scores enhances the model’s ability to classify prognostic severity groups, thereby improving its robustness and predictive power. The proposed architecture comprises a deep convolutional network that receives inputs from two publicly available CXR datasets, AIforCOVID for severity prognostic prediction and BRIXIA for severity score assessment, and branches into task-specific fully connected output networks. Moreover, we propose a multi-task loss function, incorporating an indicator function, to exploit multi-dataset integration. The effectiveness and robustness of the proposed approach are demonstrated through significant performance improvements in prognosis classification tasks across 18 different convolutional neural network backbones in different evaluation strategies. This improvement is evident over single-task baselines and standard transfer learning strategies, supported by extensive statistical analysis, showing great application potential.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/4049_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/4049_supp.pdf

Link to the Code Repository

https://github.com/cosbidev/Multi-Dataset-Multi-Task-Learning-for-COVID-19-Prognosis

Link to the Dataset(s)

https://brixia.github.io https://aiforcovid.radiomica.it

BibTex

@InProceedings{Ruf_MultiDataset_MICCAI2024,
        author = { Ruffini, Filippo and Tronchin, Lorenzo and Wu, Zhuoru and Chen, Wenting and Soda, Paolo and Shen, Linlin and Guarrasi, Valerio},
        title = { { Multi-Dataset Multi-Task Learning for COVID-19 Prognosis } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15012},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces a Multi-Dataset Multi-Task Learning (MDMT) framework for COVID-19 prognosis prediction using chest X-ray datasets. By integrating two distinct datasets, AIforCOVID and BRIXIA, the MDMT model demonstrates comparable performance compared to traditional approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Utilizing diverse datasets with different labeling schemes to provide deeper insights into patient outcome predictions, enhancing severity classification accuracy without extensive relabeling.
    2. Establishing a baseline by training the CNN backbone on each dataset and task separately, highlighting the potential enhancements brought by Multi-Task Learning.
    3. Implementing a multi-task model to facilitate multi-task optimization across data sources.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Model Overview: The manuscript lacks a clear structure, particularly in the Methodology section, which needs better organization into subsections for improved readability. An in-depth overview of the model pipeline illustrated in Figure 1 is also necessary.

    Clarification about Loss Function: Concerns arise about the loss function degenerating to a normal loss function when the data batch sampled is exclusively from one of the two datasets mentioned. A solution or modification to address this scenario is needed.

    About Model Structure: The model design closely resembles the MMoE model. Detailed comparisons highlighting differences in backbone choices and design concepts are required to distinguish the proposed model from MMoE.

    Novelty: The proposed model displays limited novelty, with a structure and optimization function that are overly similar to existing methods. This raises questions about the unique contributions and effectiveness of the model.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. [Model Overview] The manuscript needs a clearer structure. The Methodology section (Section 2) should be refined and divided into distinct subsections to enhance readability and organization. Additionally, a comprehensive overview of the framework, as depicted in Figure 1, is crucial to provide a complete description of the model pipeline. This will facilitate a better understanding of the model’s components and their interactions.

    2. [Clarification about Loss Function] The authors mention that the data batch (X) is sampled from the union of two datasets (D^{\tau_1}) and (D^{\tau_2}). However, there could be instances where all the sampled data are exclusively from one dataset, potentially causing the loss function (Eq. 2) to revert to a standard loss function. Can the authors propose any strategies to prevent or address this issue to maintain the intended diversity of the data samples?

    3. [About Model Structure] The conceptual and structural design of the proposed model appears similar to that of MMoE. Could the authors provide a detailed comparison between their model and MMoE, focusing on differences in the backbone selection and the overall conceptual approach to model design? This analysis would help clarify the distinctions and potentially highlight the unique aspects of the proposed model.

    4. [Novelty] From my perspective, the proposed model exhibits limited novelty. The overall architecture closely resembles that of existing methods such as MMoE[1] and PLE[2]. The optimization function appears to be a simple additive operation without any specific modifications, raising questions about its effectiveness. Could the authors elaborate on any unique features or capabilities that distinguish their model from previous approaches? This would help justify the novelty and potential impact of the proposed model.

    Ref: [1] Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. SIGKDD 2018. [2] Progressive layered extraction (ple): A novel multi-task learning (MTL) model for personalized recommendations. Recsys 2020.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My weak reject recommendation for this paper is based on several significant issues that undermine its contribution to the field. Firstly, the manuscript suffers from a lack of clear structure, especially in the Methodology section, which impedes understanding of the model’s framework. Secondly, there are concerns about the robustness of the loss function which may not perform as intended under certain conditions. Furthermore, the model’s design shows considerable similarity to existing models like MMoE, with insufficient differentiation in both structure and conceptualization. Lastly, the novelty of the proposed model is limited, as it does not clearly advance beyond established methods despite its use of diverse datasets and multi-task learning.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper introduces a novel framework that leverages multiple datasets to enhance predictive capabilities for COVID-19 prognostic outcomes, by integrating two different datasets (one with severity prognostic prediction, and one for severity score assessment). The authors proposed an innovative multi-task loss function to facilitate effective learning from integrated datasets. Extensive evaluations demonstrate significant performance gains over conventional approaches. Overall, the paper’s contributions include innovative framework design, integration of diverse data sources, novel loss function development, performance enhancements, and increased predictive power for COVID-19 prognostic outcome predictions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduces a novel multi-dataset multi-task framework that integrates data from two different sources to provide a comprehensive analysis of COVID-19 severity and patient outcomes. This novel approach differs from conventional single-task methods and offers new insights into prognosis prediction from chest X-rays.

    • A key innovation is the proposal of a distinctive multi-task loss function with an indicator component that helps optimize learning across the two diverse datasets. This enhances the model’s training process and boosts performance in predicting prognostic outcomes.

    • The framework’s ability to aggregate severity assessments from multiple datasets leads to improved predictive accuracy in classifying prognostic severity groups. This strengthens the model’s precision and robustness in forecasting COVID-19 outcomes from chest X-rays.

    • Extensive benchmarking using CNNs shows substantial gains over conventional transfer learning and single-task approaches.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Limited novelty: The paper makes some examples of already existing works that leverage multi-dataset multi-task learning approaches (MDMT) (citations n. 16, 18, 19, 32). Therefore, the proposed approach is not entirely novel. Nevertheless, the authors proposed a novel loss to handle the proposed MDMT approach.

    Limited comparison: The paper could benefit from a more comprehensive comparison with existing methods in the field. While it shows better performance than traditional approaches, a direct comparison with other advanced models or techniques for predicting COVID-19 prognosis could provide a more precise assessment of the framework’s effectiveness.

    Limited discussion on data integration challenges: Although the paper integrates datasets from AIforCOVID and BRIXIA, it does not delve deeply into the challenges and limitations of integrating diverse data sources. Clarifying the issues related to data harmonization, universal annotations, and potential biases introduced by combining datasets could strengthen the framework’s validity and generalizability

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • Enhance novelty in approach: While the proposed approach introduces a novel loss function for g (MDMT), the paper acknowledges existing works leveraging similar approaches. To strengthen the novelty, consider highlighting more explicitly what sets your approach apart from these prior works. Emphasize how your novel loss function specifically addresses challenges unique to COVID-19 prognosis prediction.

    • Expand comparative analysis when and if possible: To provide a more robust assessment of your framework’s effectiveness, consider expanding the comparative analysis. While outperforming traditional methods is valuable, a direct comparison with other state-of-the-art models or techniques in predicting COVID-19 prognosis would provide deeper insights. This would enhance understanding of the framework’s strengths and areas for improvement.

    • Deepen discussion on data integration challenges: Tto strengthen the framework’s validity and generalizability, delve deeper into the challenges and limitations of integrating diverse data sources. Discuss issues such as data harmonization, universal annotations, and potential biases introduced by combining datasets. Providing insights into how these challenges were addressed or mitigated would enhance the paper’s impact and relevance.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper’s strengths include its innovative design, multi-task loss function, use of diverse data, enhanced predictions, and significant performance improvements. By advancing the state-of-the-art in COVID-19 outcome forecasting, the paper makes a significant impact in this field.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors propose a novel multi-dataset multi-task training framework that predicts COVID-19 prognostic outcomes from chest X-rays (CXR) by integrating two public COVID-19 chest X-ray datasets (AIforCOVID and BRIXIA). Essentially the framework consists of a shared backbone feature extraction network, followed by two task specific fully connected network heads, with independent loss functions, to perform their respective tasks. The shared feature extractor learns capturing generalized representations across multiple tasks. The authors perform an extensive evaluation by comparing the performance of their MDMT approach to conventional transfer learning and single task learning approaches via 18 CNN architectures, demonstrating that the former outperforms the latter.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Objective: The authors claim to have developed an approach that leverages multiple correlated datasets to improve the robustness of a model which could be a very useful approach in cases with scarce data, for example, rare diseases. Attention to detail: The authors have given extensive details regarding the datasets, approaches and all the distinct statistical analysis that were performed. Extensive analysis: The authors have done extensive statistical analysis by comparing 18 different CNN architectures with multiple learning strategies, comparing their MDMT approach to standard approaches like fine tuning via imagenet and single task learning as benchmarks. Additionally, both 5-fold stratified cross-validation (CV) and leave-one-center-out (LOCO) validation has been implemented by the authors to demonstrate reliability. Further, single-tail statistical t-tests were performed for comparisons across the various experimental configurations.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Strong assumptions are made which may not be true: o The framework consists of a shared feature extractor for both tasks. Therefore, the method would heavily rely on how closely correlated the two tasks are, thereby limiting the application of this approach to datasets and tasks which are very closely correlated to one another. o The above limitation is further solidified based on the authors’ finding that for task 2, all experiments showed no significant change in performance between MDMT and the STL methods “underscoring that while task 2 benefits task 1, the converse is not necessarily true”. o If there are tasks with no mutual information, this approach may perform worse than single task learning approaches.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Comments: o A study comparing the performance of the MDMT approach to standard approaches with respect to the amount of data could further highlight the usefulness of this method. o Could the authors comment on why EfficientNet-b1-p (supplementary table) shows higher performance consistently with the finetuning approach than the MDMT approach?

    ● For future work, I would recommend: o The performance of the MDMT approach with respect to the size of the dataset could highlight that for certain tasks with sparse data (for example, ranging from 100 – 200 patients) this approach might perform better than single task learning methods.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors have developed a novel multi-dataset multi-task training framework that predicts COVID-19 prognostic outcomes. Although the paper focuses on COVID-19, this approach may be beneficial for other clinical domains with scarce data. The authors have done vast amounts of comparisons with other learning approaches and various statistical tests were performed for comparisons across the various experimental configurations, highlighting the performance of the MDMT approach.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We appreciate the reviewers’ time and effort in providing valuable feedback on our manuscript, Multi-Dataset Multi-Task Learning for COVID-19 Prognosis. Below, we address the concerns and suggestions raised. We recognize that the manuscript needs a more thorough and clearer analysis of potential limitations, as evidenced by reviewers R1 and R4. Our approach aims to demonstrate that performance improvements are achievable by extracting prognostic information from multiple datasets. These datasets have medical correlations but disjoint labels. However, the applicability of this method is limited by the need to select datasets and tasks that follow a well-determined logic, ensuring they correlate with prognostic outcomes despite different labelling schemas. In future work, we plan to develop a model capable of processing prognostic information from a broader range of heterogeneous datasets. Following the reviewers’ advices, we will emphasize the expressed limitations in the conclusions and possible future research directions. We were encouraged by reviewer R1 comments to delve into more detail on managing datasets from distinct data sources, particularly on data harmonization, universal annotations, and potential biases introduced by combining datasets. We preprocessed all images from both datasets following the same steps. These included lung segmentation, bounding box extraction of the lungs, square cropping using the bounding box, and image standardization using the same mean and standard deviation. Regarding the annotation systems and potential biases, we used the morbidity label (mild/severe) for the AIforCOVID dataset, while for the BRIXIA dataset, we adopted the original severity score system and applied a relabeling function (Eq. 4 in Section 3.1 of the Manuscript). This step was necessary to mitigate bias introduced by radiologists’ annotations. Moreover, by simplifying the severity score system, we noticed that the BRIXIA task was more supportive of the AIforCOVID task. We have not found evidence of possible universal relabeling adoptable on prognostic tasks, apart from new Self-Supervised Learning (SSL) paradigms. These paradigms include the possibility of bypassing the problem by adopting new self-labelling strategies that could be beneficial for our context. Future research will explore the generalization ability of SSL pre-trained models to downstream prognostic tasks in zero-shot inference and with parameter-efficient finetuning methodologies. In response to all reviewers’ suggestions for further analysis of advanced models and traditional MTL methodologies, we are aware that our work focuses on the learning framework rather than a new architecture. Our architecture is well-documented in the Multi Task Learning (MTL) literature (as hard parameter sharing), as evidenced by reviewer R5 referring to MMoE and PLE models. However, our custom loss function presents its novelty in the biomedical field in handling two disjoint datasets with different labelling schemes. We are intent on expanding the applicability of our framework even to more advanced MTL architectures, evaluated on single datasets with multiple tasks. Reviewer R5 expressed major concerns about the possibility that, during training, all sampled data could come exclusively from one dataset, causing the custom loss function (Eq. 2 in Section 2 of the manuscript) to revert to a standard loss function. This scenario was investigated during the experimental phase, leading to the implementation of a custom sampler to ensure the presence of samples from both datasets in any batch passed to the models. The results obtained are not significantly different from those presented in the manuscript. For this reason, it was not considered during the manuscript drafting. For clarification, we will provide a description of what was previously described in the experimental settings section.




Meta-Review

Meta-review not available, early accepted paper.



back to top