Abstract

In emergency departments (ED), efficient triage is essential for timely patient care, but challenges like missing and sparse data often hinder the prediction performance of severity level and department. To address these issues, we propose a novel intelligent triage method that incorporates a Conditional Gaussian Mixture Imputation (CGMI) and a Feature Densification Module (FDM). The CGMI handles missing data through conditional probability modeling, while the FDM obtains correlations between variables by calculating the Manhattan distance between non-zero values in a one-hot coded feature. In addition, we design a multi-scale Feature Extraction Module (mFEM) to capture multi-level semantic information from patient complaints. Subsequently, two feature fusion strategies were introduced: early fusion and late fusion. The early fusion combines Principal Component Analysis (PCA)-processed features with another modality. The late fusion with enhancement introduces reverse features of another modality and applies an attention mechanism to obtain salient features. Experimental results show that our method outperforms existing approaches, achieving 84.83% sensitivity, 85.11% specificity, and 61.42% Cohen’s Kappa for severity prediction and 90.89% sensitivity, 91.04% specificity, and 85.87% Cohen’s Kappa for department prediction. Our method significantly improves the sensitivity, specificity, and robustness of ED triage, demonstrating superior performance and reliability in handling missing and sparse clinical data.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0666_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/xiaoyiseu/CGMI

Link to the Dataset(s)

https://github.com/xiaoyiseu/CGMI/tree/main/data

BibTex

@InProceedings{XiaYi_ANovel_MICCAI2025,
        author = { Xiao, Yi and Zhang, Jun and Chi, Cheng and Wang, Chunyu},
        title = { { A Novel ED Triage Framework Using Conditional Imputation, Multi-Scale Semantic Learning, and Cross-Modal Fusion } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15969},
        month = {September},
        page = {13 -- 22}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a method for classifying severity and department from tabular triage records. The proposed method includes a conditional distribution-based imputation and multi-level feature fusion.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper applies a novel Conditional GMM-based imputation and designed multi-level feature fusion for pathological triage records.

    2. The CGMI achieves better performance contrasted with other imputation methods

    3. The FEF achieves better performance than other backbone models

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Imputation with Conditional GMM has been explored in prior works (e.g. https://doi.org/10.1111/biom.13410, https://doi.org/10.1016/j.spasta.2016.11.002). The paper did not mention/review them.

    2. Input data consists of relatively sparse vitals and chief complaints in text, but model ablation on individual modality was not provided. It remains to show the contribution of each modality (i.e. how CGMI can improve on vitals along, and how good of performance a model can achieve only using text data.)

    3. Reported model performance is concerning and limiting its clinical impact. For the most severe level 1, AP is only 0.276 and majority of them are considered as level 3. This is posing large risk if adopted in ED triage. The “long-tail” issue in department classification is un resolved: 3 minority classes are experiencing very low AP.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Code availability claimed but not provided as anonymous repo or accessible to review.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Major weaknesses 2 & 3 can’t be addressed without significant more experimental results.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    I will maintain my prior rating as reject. Here are the reasons:

    1. While an imbalanced dataset was collected, no mitigation was discussed or considered, resulting in significant performance gap and posing risks on high severity patients. This concern was not addressed.

    2. The other concern of multimodal vs. single-modal, which was critical validation that was missing from the original paper, would not be potentially resolved (i.e., given the sparsity of vital data, if text-only can achieve just as good performance, the motivation of the work would be greatly hindered). The rebuttal message did not help resolving this concern.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a novel ED triage framework that incorporates Conditional Gaussian Mixture Imputation (CGMI) for handling missing structured data and a Feature Densification Module (FDM) for enhancing sparse feature representation. The framework also introduces a multi-scale Feature Extraction Module (mFEM) for processing patient complaints along with early and late fusion strategies for multimodal feature integration.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper use the mechine learning algorithm to solve the problem in the ED.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. This paper assume the data distribution follows GMM. It is recommended to analysis and show the data distribution as it may has severely skewed vital signs data.

    2. I found that after the CGMI, the performance will be decreased compared with not using FDM in Table 1 (84.6 vs 84.13, 85.39 vs 83.48,…). When PCA is applied, the use of FDM does not lead to significant improvements in model performance, and some metrics even show a decline (comparing the third-to-last row with the last row in Table 1). Is this because the Manhattan distance for calculating feature relationships is too simple?

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents clear methodology and comprehensive experimental results with concise and informative figures

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposes a novel approach to enhance severity and department prediction in ED triage by addressing the challenges of missing and sparse data. The method incorporates Conditional Gaussian Mixture Imputation (CGMI) to mitigate the impact of missing structured data, a Feature Densification Module (FDM) to capture relationships among sparse variables, and a multi-scale Feature Extraction Module (mFEM) to improve the semantic representation of unstructured data.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper aims to solve the critical and important clinical problem of ED triage with emphasis on a real-life setting with the challenges of missing and sparse data
    2. The proposed method is novel and incorporates many components to enhance the performance.
    3. A comprehensive analysis and experiments are conducted to validate the method.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. All presented results are missing a statistical significance test such as p-value to prove that the presented results are indeed significantly improved compared to other results
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    Small note - It is difficult to see the full image in Fig 2.(b).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written and deals with an important clinical issue. The model architecture is justified in the ablation study section but the statistical test to verify the improvement of the results is missing.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    The paper is missing statistical significance on the results but the datasets is quiet large which makes me believe that the results are indeed significant.




Author Feedback

Dear Chairs and Reviewers, We sincerely appreciate your time and effort in reviewing our manuscript.

Response to Reviewer #1: Thank you very much. (Q7-1) Data distribution follows GMM: Prior to submission, we confirmed that each structured data exhibits clear skewness and is better modeled by multiple Gaussian components; single‐Gaussian fits consistently underfit the heavy tails and modes in our data. We will release the analysis scripts and full results in our GitHub repository.

(Q7-2): We believe the primary reason is that, after CGMI processing, some instances with nearly identical structured features still differ in severity due to unstructured text cues; therefore, we apply PCA to inject cross‑modal information and improve discrimination. In Table 1, using row 4 as the baseline demonstrates that neither PCA alone (row 5) nor FDM alone (row 6) yields significant gains, but their combination (row 7) achieves the best overall performance. The slight performance drops when pairing PCA with FDM arise because the high dimensionality of the FDM‑processed features dominates the fused representation. Finally, we acknowledge that long‑tail class imbalance continues to limit improvements (see p. 8), and we will investigate targeted mitigation strategies in future work.

Response to Reviewer #2: Thank you very much. (Q7-1) Relevant works: We will include a concise discussion of both works in the camera-ready version.

(Q7-2): In Table 1, row 1 reports the modality‑specific baselines—Severity prediction using only structured data (including vital signs) and Department prediction using only text—thereby isolating each modality’s standalone performance. Accordingly, the impact of CGMI on the structured modality is reflected in Severity prediction: adding CGMI raises accuracy from 82.76% (row 2) to 84.60% (row 4) and from 82.95% (row 3) to 84.13% (row 6), demonstrating consistent gains. For the text modality, since text processing is dominated by the BERT encoder, mFEM, and FEF modules—and our focus is on addressing challenges such as missing and sparse data—we limit ablation to the FEF and mFEM modules. Specifically, comparing Department results in rows 1 and 2 of Table 1 constitutes the FEF ablation, and comparing the final row (“ours”) against the penultimate row (“NomFEM”) in Table 3 reflects the mFEM ablation.

(Q7-3): First, we emphasize that the performance degradation stems from the extremely small sample sizes of certain classes, rather than from any intrinsic limitation of our method’s discrimination. We are confident that, as more data becomes available for these under‑represented categories, their recognition performance will similarly improve. Specifically, in Fig. 2 (a), Level 1 (N = 46) and Level 2 (N = 221) cases are scarce, and in Fig. 2 (b), Trauma (N = 5), Orthopedics (N = 55), and Neurosurgery (N = 7) are particularly under‑represented in the test set. These low test‐set counts imply equally limited training and validation samples, which hinders the model’s ability to learn sufficiently discriminative features for these classes. While our method currently struggles with the rarest classes, it achieves excellent performance on sufficiently represented categories—for instance, in Fig. 2 (b), Ophthalmology (N = 263) and Gynecology (N=226) attain AP = 0.9883 and 0.7535 respectively. Due to space limitations, this paper focuses on addressing missingness and sparsity in structured data, while the long‑tail issue will be explored in future work.

(Q10) Code availability: Rebuttal rules prohibit sharing external links; we will publicly release our code and dataset.

Response to Reviewer #3: Thank you very much. (Q7-1) Statistical significance: All metrics are from single runs, and rebuttal guidelines prohibit adding new experiments at this stage.

(Q10) Figure 2(b) clarity: We will adjust the legend placement in the camera‑ready version to avoid overlap and improve clarity.

Thank you again. The authors




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper introduces a triage framework combining conditional imputation, multi-scale semantic learning, and cross-modal fusion for emergency department (ED) data. Reviewer 1 recommends weak accept, noting clear methodology and comprehensive results, though with concerns around performance consistency. Reviewer 3 also leans positive, citing the strength of the dataset, though statistical significance is not reported. However, Reviewer 2 recommends rejection, raising critical concerns regarding the lack of imbalance mitigation for underrepresented classes and insufficient justification for multimodal integration. While the rebuttal offers thoughtful clarification—particularly on the role of CGMI, component ablations, and limitations from long-tail class distribution—these responses fall short of fully resolving Reviewer 2’s concerns around clinical applicability and modality dependence. Additionally, the absence of statistical validation and broader external testing limits the strength of the findings.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



back to top