Abstract

Acute Respiratory Distress Syndrome (ARDS) is a critical adverse event with high modality rates, yet its recognition in ICU settings is often delayed. Clinicians face significant challenges in integrating asynchronous, multi-modal data streams with misaligned temporal resolutions during rapid deterioration. This work introduces a deep learning model for continuous ARDS risk monitoring, designed to dynamically integrate diverse ICU data sources and generate timely, actionable predictions of ARDS onset. We extend existing settings for ARDS detection from static, single-modality prediction to continuous, multi-modal monitoring that aligns with clinical workflows. To address the inherent complexities of this task, we propose tailored solutions for hierarchical fusion across irregular sampling points, heterogeneous data modalities, and sequential predictions, while ensuring robust training against dynamic, irregular inputs and severe class imbalance. Validated on 1,985 MIMIC-IV patients, our model demonstrates superior performance, achieving average AUROC scores of 0.94, 0.91, and 0.87 across 6, 24, and 48 hours pre-onset, respectively, outperforming previous models (AUROC 0.78–0.85). Furthermore, the model quantifies emergency level to aid in resource prioritization and identifies high-risk patients with peak relative risk reaching 25, demonstrating exceptional discrimination between cohorts. The code is publicly released at https://github.com/YidFeng/MICCAI25-ARDS-Risk-Prediction

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1696_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/YidFeng/MICCAI25-ARDS-Risk-Prediction

Link to the Dataset(s)

N/A

BibTex

@InProceedings{FenYid_Asynchronous_MICCAI2025,
        author = { Feng, Yidan and Zhang, Bohan and Deng, Sen and Hu, Zhanli and Qin, Jing},
        title = { { Asynchronous Multi-Modal Learning for Dynamic Risk Monitoring of Acute Respiratory Distress Syndrome in Intensive Care Units } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15974},
        month = {September},
        page = {13 -- 22}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper’s main contribution is using multiple modalities (CXR, vital signs, and laboratory results) to develop an early warning system for predicting acute respiratory distress syndrome. The authors also highlighted modularity through a staged temporal-modal (STM) fusion module and progressive context memory (PCM) to address sequential data with various recording frequencies.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The problem definition and writing were strong, with clear rationale and clinical motivation. 2) The authors provided statistical testing to showcase the significance of their model ablations, as shown in Figure 2. In addition, standard deviations are provided in Table 2. 3) In Figure 3, the authors dive into specific use cases of their model, which highlights its clinical utility.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1) While ablation comparisons are great, the baselines to compare this model are lacking. The authors compared their model to ablations and single-modal models (using the same backbones as their models). Different backbones/pretrained methods for CXR could be used to make predictions and compare them to their method (i.e., there are many CXR foundation models to compare unimodal performance). 2) Inconsistent use of metrics. The authors used AUROC, ACC, sensitivity, and MAE in Figure 1. At the same time, specificity is included in Table 2. There is an inconsistency here; why is specificity not included in Figure 1? Similarly, in Table 1, slope and volatility are included, but the reasons for excluding these in other figures are not explained. 3) Given that this is a time-based prediction model, positive predictive value (PPV) is very important due to alert fatigue (Muralitharan, 2021). As the main premise is the model’s utility, having this metric in the comparisons is useful. 4) The figures and explanations regarding a multi-modal transformer layer are unclear. Consequently, it is challenging to evaluate their technical contributions. Page 5 states that “cross-modal interaction is then modeled by fusing all modality summaries using multi-modal transformer layers.” What is a multi-modal transformer layer? Is it a cross-attention layer, and if so, how were the three modalities integrated into this layer? – Work Cited Muralitharan, Sankavi, et al. “Machine learning–based early warning systems for clinical deterioration: systematic scoping review.” Journal of medical Internet research 23.2 (2021): e25187.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    1) You use “Chest X-rays” but defined CXR; please be consistent with acronyms. 2) On page 3, “i = 1, 2, .. …”, this is a poor notation, define an end (i.e., last time point available and/or prior to ARD) and state it; it would be a bit more clear. 3) Page 4, you defined E\inR^{d\times|C|}, C is not defined. While it is evident that C is the number of categories, it would be good to define this. 4) The method is interesting and potentially food for thought; exploring some casual modeling would be interesting (Das, 2024). – Work Cited Das, Abhimanyu, et al. “A decoder-only foundation model for time-series forecasting.” Forty-first International Conference on Machine Learning. 2024.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    There is a lack of justification for missing baseline methods of comparison. While the writing is clear, the technical contributions are also not well explained, making it hard to judge whether the authors made large contributions or used standard methods of multimodal fusion. However, the figures are well presented, and the clinical problem is well-defined.

  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    In their rebuttal, the authors clarified that their approach employs a standard transformer encoder. While this is reasonable, my primary concern remains unresolved: it is still unclear how the three modalities are integrated within the model. If the design relies on simply concatenating all modalities into a single input sequence, it raises questions about the model’s capacity to effectively learn cross-modal interactions.

    At this stage, I am not confident in my overall assessment of the paper. The technical novelty is limited, and key implementation details are under-explained. That said, the paper presents a compelling clinical application, and the experimental evaluation in that context is promising.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a deep learning model to dynamically monitor the ARDS onset risk (0-1) and emergency level (1 to 4) in ICU patients by making frequent predictions (every 6 hours), starting from the initial time t0. At each prediction time ti, the model processes multi-modal data over a period T, which is the lesser of 72 hours or the time since admission. Emergency level 4 means ARDS onset within 12 hours, level 3 within 12-24 hours, level 2 within 24-48 hours, and level 1 for longer than 48 hours. The objective is to help guide interventions and resource allocation efficiently.

    The main contribution of the paper is to implement the fusion of asynchronous multi-modal data (CXR, vital signs, laboratory data), more or less sparsely acquired during the patient stay in ICU, and perform a dynamic risk prediction leveraging the sequentially accumulated data.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Innovative approach: The paper proposes a novel deep learning model for dynamic ARDS risk monitoring.
    • Multi-modal data fusion: Effective fusion of asynchronous multi-modal data for risk prediction.
    • Real-time inference: The model enables real-time inference while preserving long-range dependencies.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    • Temporal information preservation: Unclear how temporal information is preserved in the 2nd stage of STM fusion.
    • Training data organization: Lack of clarity on how training data is organized and processed.
    • Performance evaluation details: Some aspects of performance evaluation and sampling choices need more clarification.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    This paper proposes a deep learning model to dynamically monitor the ARDS onset risk (0-1) and emergency level (1 to 4) in ICU patients by making frequent predictions (every 6 hours), starting from the initial time t0. At each prediction time ti, the model processes multi-modal data over a period T, which is the lesser of 72 hours or the time since admission. Emergency level 4 means ARDS onset within 12 hours, level 3 within 12-24 hours, level 2 within 24-48 hours, and level 1 for longer than 48 hours. The objective is to help guide interventions and resource allocation efficiently.

    The main contribution of the paper is to implement the fusion of asynchronous multi-modal data (CXR, vital signs, laboratory data), more or less sparsely acquired during the patient stay in ICU, and perform a dynamic risk prediction leveraging the sequentially accumulated data.

    The deep-learning architecture proposed is composed of three stages. First, in the feature encoding step, raw data is converted into unified latent features, processed according to the modality type, resulting in d-dimensional tokens for each modality (d value is not specified). The feature tokens in each modality are aggregated into an additional CLS token (no details provided for the aggregation). Second, the fusion step deals with asynchronous multi-modal data combination in two stages, leveraging Transformer architectures: in the first STM Fusion stage, data from each modality is passed through a Transformer which provides a “summary vector” corresponding to the CLS input (cf. Fig. 1). Cross-modal interaction is then modeled by fusing all modality “summaries” using multi-modal transformer layers. The fused output is fed to a Progressive Context Memory (PCM) stage for sequential prediction optimization. PCM models dependencies in prediction tasks using incremental encoding and memory-augmented attention. At each time ti, it processes only new observations since ti−1 to obtain the new features ∆ˆfi. A compact memory bank M stores historical context and updates with ∆ˆfi. A transformer layer then learns attention weights for context aggregation and an MLP head predicts the risk and emergency level. This mechanism enables real-time inference while preserving long-range dependencies.

    In this 2-step multimodal fusion architecture, it is not very clear how the temporal information of data samples from all modalities is preserved in the 2nd stage of STM fusion. Since modality data are intricate in time, after the first transformer layer processes each modality separately, the “summary” vectors presented at the input of the cross-modality fusion transformer would have lost individual sample positions in time. Does the model consider as synchronous the “summary” vectors coming from different modalities as input to the multi-modal transformer layer? (That is, after their modality fusion, are all “summary” data in the analyzed time window considered as synchronized?)

    Model training uses late batching and balanced sampling. It is not clear how the training data is organized. During a batch, are all data samples considered for the whole 72h time lapse? Are these data samples partitioned per 6-hour intervals to create the sequential “summary” vectors for PCM input? If so, how many samples from each modality are selected per 6-hour window? Similar questions arise for the inference at each time step ti. Notably, the authors illustrate the impact of late batching and balanced sampling in model training, leading to increased performance (Table 1).

    The model performance is compared with baseline versions (w/o STM, w/PCM) during ablation studies. The results reported in Fig. 2 concern the averaged onset risk and emergency level across all sequential prediction points ti. It would have been preferred to make these comparisons at specific time points (as shown in Fig. 3g, for example, i.e., for 6h, 24h, 48h prior to ARDS onset). The averaged values might hide some important behavior. The authors show that the proposed model has better (and statistically significant) average performance scores than the baselines (w/o STM + w/PCM). However, it is shown that the baseline w/o STM arrives close behind with no statistical difference for Sensitivity and MAE. I would suggest adding numerical values on the bars in Fig. 2, as it is very difficult to read the real values from the graphic.

    Concerning the Performance Evaluation of Dynamic Risk Monitoring, the authors show increased AUROC performance vs prior studies. Similarly, patient risk stratification is demonstrated to be correct for a large threshold interval of the predicted risk (0.3-0.9) with maximum confidence for a threshold of 0.7. Interestingly, selecting a threshold of 0.4, the relative risk drops drastically compared to 0.3 and 0.6 thresholds (but still confident for risk stratification); I would have liked to have more insights on this behavior (but I am aware of paper length constraints preventing more in-depth discussion). Moreover, the patient stratification with respect to predicted urgency levels showed clinically acceptable margin errors but with systematic under-prediction of extremes, which would require additional tuning of the loss function according to the authors.

    For the AUROC performance evaluation, it is not clear why the choice of sampling points for negative vs positive cases (“negative cases: random windows; positive cases: pre-onset window”) was made. Why not use the same windows for positive and negative cases? May the current choice introduce a bias? In Fig. 3c-f, when mentioning “prediction time to ARDS onset 12-18h,” does it mean that data samples are taken from t0 to t_onset-12h or from t_onset-12h to t_onset-18h?

    The paper is well written, reporting a fair level of detail, even if some aspects would deserve more clarification according to my previous comments.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a deep learning model to dynamically monitor the ARDS onset risk (0-1) and emergency level (1 to 4) in ICU patients by making frequent predictions (every 6 hours), starting from the initial time t0. At each prediction time ti, the model processes multi-modal data over a period T, which is the lesser of 72 hours or the time since admission. Emergency level 4 means ARDS onset within 12 hours, level 3 within 12-24 hours, level 2 within 24-48 hours, and level 1 for longer than 48 hours. The objective is to help guide interventions and resource allocation efficiently.

    The main contribution of the paper is to implement the fusion of asynchronous multi-modal data (CXR, vital signs, laboratory data), more or less sparsely acquired during the patient stay in ICU, and perform a dynamic risk prediction leveraging the sequentially accumulated data.

    The deep-learning architecture proposed is composed of three stages. First, in the feature encoding step, raw data is converted into unified latent features, processed according to the modality type, resulting in d-dimensional tokens for each modality (d value is not specified). The feature tokens in each modality are aggregated into an additional CLS token (no details provided for the aggregation). Second, the fusion step deals with asynchronous multi-modal data combination in two stages, leveraging Transformer architectures: in the first STM Fusion stage, data from each modality is passed through a Transformer which provides a “summary vector” corresponding to the CLS input (cf. Fig. 1). Cross-modal interaction is then modeled by fusing all modality “summaries” using multi-modal transformer layers. The fused output is fed to a Progressive Context Memory (PCM) stage for sequential prediction optimization. PCM models dependencies in prediction tasks using incremental encoding and memory-augmented attention. At each time ti, it processes only new observations since ti−1 to obtain the new features ∆ˆfi. A compact memory bank M stores historical context and updates with ∆ˆfi. A transformer layer then learns attention weights for context aggregation and an MLP head predicts the risk and emergency level. This mechanism enables real-time inference while preserving long-range dependencies.

    In this 2-step multimodal fusion architecture, it is not very clear how the temporal information of data samples from all modalities is preserved in the 2nd stage of STM fusion. Since modality data are intricate in time, after the first transformer layer processes each modality separately, the “summary” vectors presented at the input of the cross-modality fusion transformer would have lost individual sample positions in time. Does the model consider as synchronous the “summary” vectors coming from different modalities as input to the multi-modal transformer layer? (That is, after their modality fusion, are all “summary” data in the analyzed time window considered as synchronized?)

    Model training uses late batching and balanced sampling. It is not clear how the training data is organized. During a batch, are all data samples considered for the whole 72h time lapse? Are these data samples partitioned per 6-hour intervals to create the sequential “summary” vectors for PCM input? If so, how many samples from each modality are selected per 6-hour window? Similar questions arise for the inference at each time step ti. Notably, the authors illustrate the impact of late batching and balanced sampling in model training, leading to increased performance (Table 1).

    The model performance is compared with baseline versions (w/o STM, w/PCM) during ablation studies. The results reported in Fig. 2 concern the averaged onset risk and emergency level across all sequential prediction points ti. It would have been preferred to make these comparisons at specific time points (as shown in Fig. 3g, for example, i.e., for 6h, 24h, 48h prior to ARDS onset). The averaged values might hide some important behavior. The authors show that the proposed model has better (and statistically significant) average performance scores than the baselines (w/o STM + w/PCM). However, it is shown that the baseline w/o STM arrives close behind with no statistical difference for Sensitivity and MAE. I would suggest adding numerical values on the bars in Fig. 2, as it is very difficult to read the real values from the graphic.

    Concerning the Performance Evaluation of Dynamic Risk Monitoring, the authors show increased AUROC performance vs prior studies. Similarly, patient risk stratification is demonstrated to be correct for a large threshold interval of the predicted risk (0.3-0.9) with maximum confidence for a threshold of 0.7. Interestingly, selecting a threshold of 0.4, the relative risk drops drastically compared to 0.3 and 0.6 thresholds (but still confident for risk stratification); I would have liked to have more insights on this behavior (but I am aware of paper length constraints preventing more in-depth discussion). Moreover, the patient stratification with respect to predicted urgency levels showed clinically acceptable margin errors but with systematic under-prediction of extremes, which would require additional tuning of the loss function according to the authors.

    For the AUROC performance evaluation, it is not clear why the choice of sampling points for negative vs positive cases (“negative cases: random windows; positive cases: pre-onset window”) was made. Why not use the same windows for positive and negative cases? May the current choice introduce a bias? In Fig. 3c-f, when mentioning “prediction time to ARDS onset 12-18h,” does it mean that data samples are taken from t0 to t_onset-12h or from t_onset-12h to t_onset-18h?

    The paper is well written, reporting a fair level of detail, even if some aspects would deserve more clarification according to my previous comments.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The authors provided the necessary clarifications for the main questions raised during the paper review



Review #3

  • Please describe the contribution of the paper

    A dynamic ARDS risk monitoring model integrating multi-modal asynchronous data (CXR, vital signs, laboratory) via a modified transformed architecture (STM Fusion, PCM). It generates continuous predictions aligned with clinical workflows and quantifies urgency, with exceptional performance (AUROC = 0.94 at 6h pre-ARDS).

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Multi-modal integration: Innovative asynchronous data management via STM Fusion and PCM. Clinical performance: Outperforms existing methods (AUROC +9% vs. Green Score [17]). Practical utility: Risk stratification (RR = 25) and emergency indication for ICU resources. Robustness: Adapted training strategies (Late Batching, Balanced Sampling)

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    Architectural complexity: The use of transformers limits clinical interpretability. Calibration of emergencies: Underprediction of extreme levels (1 and 4). Generalizability: Validation only on MIMIC-IV, a single-center dataset.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Explore interpretability methods (e.g., attention maps) to enhance clinical confidence. Validate the model on external datasets (e.g., eICU) to confirm generalizability.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (5) Accept — should be accepted, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work addresses an urgent need in intensive care with a technically rigorous solution. Performance (AUROC > 0.9) and risk stratification (RR = 25) demonstrate immediate clinical impact. Limitations (complexity, calibration) are minor compared to the advances.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We deeply appreciate the constructive feedback from all reviewers. Due to space constraints, we focus here on addressing major concerns, but will carefully incorporate all suggestions in our final revision.

Reviewer #2

  1. Temporal information preservation of different modalities in the 2nd stage of STM fusion Response:
    Thanks. As mentioned in Sec. 2.2 “where modality-aware positional encodings are employed to handle temporal asynchrony.” In detail, we embed both timestamp info (from unified time axis for different modalities, see Sec. 2.1) and modality type for positional encoding. Although features of different modalities were summarized for 2nd stage fusion, each data sample of each modality in the 1st stage was associated with the positional embedding differentiating time points across modalities. Thanks for pointing out the confusion, we shall further clarify the positional encoding details in the updated version.

  2. RR curve on threshold of 0.4 (drops drastically) Response: The RR score is low when the ARDS incidence in low-risk group is high, or in high-risk group is low. When 0.3, low-risk group is small (32%) but accurate (therefore incidence low) to dominate RR score. 0.4 is a transition, low-risk group expand to be less accurate w/ a small size, but high-risk group is still not accurate enough. This leads to the drop. After 0.5, high-risk group is more accurate and dominate the RR score.

  3. Setting explanation Response: The window length for pos/neg cases is the same. Since neg case has no ARDS onset, we use random reference time for evaluation. For the current prediction point, feature of data from the latest 6 hours window was extracted (previous features saved in PCM). During a batch, all available data samples for each modality were utilized, this number varies according to actual data (Fig. 1 ), and our model is capable to analyze this irregular data.

Reviewer #3

  1. Lacking justification for missing baseline methods (CXR backbones) for unimodal comparison Response: As mentioned in our contribution (Sec. 1), the focus of this paper is on exploring effective solutions (fusion architecture and training strategies) for the specific challenges in dynamic, multimodal ARDS monitoring. Design of CXR backbone is not our contribution, therefore we keep this choice the same for ablation study on modality combinations. Thanks for this suggestion, we believe that exploring advanced backbone models could further improve the performance based on our proposed solution.

  2. Inconsistent use of metrics: slope and volatility only in Table 1 & not using specificity in Figure2 Response: Slope and volatility are included specifically for comparing training stability and convergence, as Table 1 is for ablation of training methods. For others, the training setting is kept the same and therefore did not involve slope and volatility. Figure 2 balances readability and analytical value, as the layout will be crowed if all metrics plotted, as AUROC holistically captures the trade-off between sensitivity and specificity across thresholds. Specificity: A-0.76, B-0.81, C-0.72, D-0.89 (low sensitivity).

  3. Unclear description w.r.t detailed structure of multi-modal Transformer, and thus hard to evaluate technical contribution: Response: Thanks for pointing out this confusion. We used standard Transformer Encoder Layers (self-attn and feedforward network) for all modules named “Transformer Layers” in Fig. 1. This should be clarified in the caption of Fig. 1. This paper focus on the application and solves the specific challenges in dynamic multimodal monitoring of ARDS, as mentioned in contribution (Sec. 1), we have achieved exceptional performance on ARDS risk prediction. Accordingly, our tailored solutions contribute to this specific application. We did not claim technical novelty in multimodal fusion itself, as the setting and data for this application is quite different from many multimodal related works.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    Out of three reviewers, two of them are very excited about the nature of this work which is targeted to solve an interesting and important clinical task, as stated in the following.

    “That said, the paper presents a compelling clinical application, and the experimental evaluation in that context is promising.”

    The actual technical methods it employed seem solid and we do not always have to invent “unnessary so-called novel technical method” to solve a problem. The key should be to solve the critical problem with the best possible solution.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top