Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Stroke diagnosis in emergency rooms (ERs) is challenging due to limited access to MRI scans and a shortage of neurologists. Although AI-assisted triage has shown promise, existing methods typically use MRI-derived training labels, which may not align with stroke patterns in patient multimedia data. To address this mismatch, we propose an Adaptive Uncertainty-aware Stroke TrIage Network (AUSTIN) that leverages inconsistencies between clinician triage decisions and MRI-derived labels to enhance AI-driven stroke triage. This approach mitigates overfitting to clinician-MRI disagreement cases during training, significantly improving test accuracy. Additionally, it identifies high-uncertainty samples during inference, prompting further imaging or expert review. Evaluated on a clinical stroke patient dataset collected in an ER setting, AUSTIN achieves over 20\% performance gain over human triage and a 13\% improvement over a prior state-of-the-art method. The learned uncertainty scores also show strong alignment with discrepancies in clinical assessments, highlighting the framework’s potential to enhance the reliability of AI-assisted stroke triage.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2102_paper.pdf

SharedIt Link: https://rdcu.be/eHxb3

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05185-1_15

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/shuashua0608/AUSTIN

Link to the Dataset(s)

N/A

BibTex

@InProceedings{YanShu_Enhancing_MICCAI2025,
        author = { Yang, Shuhua AND Cai, Tongan AND Ni, Haomiao AND Ma, Wenchao AND Xue, Yuan AND Wong, Kelvin AND Volpi, John AND Wang, James Z. AND Huang, Sharon X. AND Wong, Stephen T. C.},
        title = { { Enhancing AI-assisted Stroke Emergency Triage with Adaptive Uncertainty Estimation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15973},
        month = {September},
        page = {148 -- 157}
}

Reviews

Review #1

Please describe the contribution of the paper

The main contribution of the paper is the proposal of an adaptive uncertainty-aware stroke triage network, AUSTIN, which aims to enhance AI stroke triage by leveraging the inconsistency between clinician triage decisions and MRI-derived labels. However, the paper only compares the proposed method with a single, outdated approach from 2022, which limits the scope and relevance of the comparison.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1.The paper introduces a novel framework for AI-assisted stroke triage. 2.The proposed method shows improvements over human triage and previous state-of-the-art methods.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

1.The paper could provide more details on how the model handles cases with conflicting labels and how it generalizes to different clinical settings. Further validation with a larger and more diverse dataset would strengthen the claims of the model’s effectiveness. 2.The paper lacks a comprehensive comparison with current state-of-the-art methods, focusing only on a single, outdated approach.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a novel approach to AI-assisted stroke triage, but its comparison with only one outdated method and lack of detail on reproducibility make it less convincing. A more comprehensive evaluation against current state-of-the-art methods and a detailed description of the experimental setup are needed.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The authors present a novel stroke diagnosis framework based on a 3-path encoder of visual features and two sets of related audio features. The network predicts both the presence of a stroke as well as the related uncertainty as learnable observation noise parameter. The approach is compared to a benchmark method on a real world dataset.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors address an important issue of multi-modal health data: label disagreement. By including an adaptive weight to train the learnable observation noise parameter, the authors are able to show the correlation between predicted uncertainties in case of label disagreement vs. no label disagreement. In addition, the baseline accuracy for stroke detection is improved over the clinical rating and a baseline comparison method.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Although the analysis of the predicted uncertainties provides helpful insights into the performance of the model and there clearly is a correlation between label mismatch and the distribution of predicted uncertainty values, authors have not convincingly laid out their claim for clinical relevance e.g. using concrete examples or suggesting and subsequently evaluating uncertainty thresholds. It would be interesting to show what percentage of wrong predictions can be filtered out for certain thresholds. What is the safe uncertainty level, if any? Some statements are also too optimistic: Based on the large overlap of uncertainty distributions, the clinical relevance for a single patient seems questionable.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My recommendation is based on the novelty of the approach for stroke assessment as well as the relevance of the scenario considered (multi-modal data with label disagreements). Authors should strengthen their analysis and description of the uncertainty predictions.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors have addressed my concerns sufficiently in their rebuttal including the updated Fig. 3c. The concerns raised by the other reviewers have also been addressed appropriately in my view.

Review #3

Please describe the contribution of the paper

The paper presents a novel approach to AI-assisted stroke emergency triage called AUSTIN (Adaptive Uncertainty-aware Stroke TrIage Network). Current AI-assisted stroke triage methods typically use MRI-derived training labels, which may not align with stroke patterns in patient multimedia data. This mismatch can lead to ineffective models. AUSTIN addresses this issue by leveraging the inconsistency between clinician triage decisions and MRI-derived labels to enhance AI stroke triage. AUSTIN identifies high-uncertainty samples during inference, prompting further imaging or expert review. The framework incorporates an adaptive uncertainty-aware loss function that allows the model to learn efficiently from high-confidence cases while capturing uncertainty in ambiguous cases.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The problem is important and methodologically (and potentially clinically) for creating triage systems in emergency rooms. The paper highlights the challenges in stroke diagnosis in emergency rooms, including limited access to MRI scans and shortage of neurologists. AUSTIN aims to improve stroke triage accuracy using multimedia data (video and audio) from patients The authors argue that current stroke diagnosis in emergency rooms (ERs) is challenging due to limited access to MRI scans and shortage of neurologists. They claim that while AI-assisted stroke triage shows promise, existing methods typically rely solely on MRI-derived training labels, which may not align with stroke patterns in patient multimedia data The paper justifies well the importance of generating identity-free audio-visual features through adversarial training, potentially improving the model’s generalizability.
2. The problem and proposed approach are well motivated. AUSTIN leverages both MRI-derived labels and clinician triage decisions during training, addressing potential mismatches between multimedia-based predictions and MRI findings
3. The state of the art is comprehensive enough: it provides a good understanding of the limitations of previous approaches and the current clinical practice.
4. The experimental design and the test done by the authors are very thorough. They performed experiments on several large datasets and compared their method with state of the art models.
5. The results are well presented and the authors present both quantitative and qualitative assessments of the experiments, presenting ablation studies in great detail. Significant performance improvements: AUSTIN demonstrates substantial performance gains over both human triage (>20% improvement) and previous state-of-the-art methods (13% improvement in AUC). This suggests potential for more accurate and efficient stroke triage in resource-constrained settings.
6. The learned uncertainty scores used by the proposed model align well with mismatches in clinical assessments, offering insights into case difficulty and potential areas for further clinical evaluation. This feature could provide valuable decision support for clinicians.
7. The model incorporates a novel loss function that adjusts based on the agreement between MRI and triage labels. This allows the model to express uncertainty in ambiguous cases
8. The authors discuss future areas of research and the limitations of the current iteration of their approach.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

While the model shows significant improvements over previous methods, there’s a risk of overfitting to the specific dataset used, especially given its small size. The paper doesn’t discuss measures taken to prevent overfitting beyond mentioning a dropout ratio.

The document focuses on overall performance metrics but doesn’t provide a detailed analysis of false positives or false negatives, which could be crucial in a clinical context.

The approach relies on good quality video and audio data. In real emergency room settings, collecting such data consistently might be challenging, potentially affecting the model’s performance
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(5) Accept — should be accepted, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed method AUSTIN aims to improve upon existing AI-assisted stroke triage methods by addressing the mismatch between MRI-derived training labels and observable stroke patterns in patient multimedia data and the results back these aims.: the proposed method significantly improves stroke triage performance, showing over 20% gain compared to human triage and 13% improvement over prior state-of-the-art methods.

The incorporation of the adaptive uncertainty-aware loss function, which allows the model to capture and quantify uncertainty in its predictions is very interesting in this particular context.. This is particularly useful for identifying ambiguous cases that may require further expert review.

The learned uncertainty scores align well with mismatches in clinical assessments, providing a valuable tool for risk estimation in AI-assisted stroke triage, although more test and out-of-distributions validation is needed.

The used multimodal approach (a combination of facial video and speech data) leveraging multiple information sources for more comprehensive assessment is very interesting.

While the approach focused on stroke triage, the adaptive uncertainty-aware training framework shows potential for application in other medical domains where multimodal diagnostic information may yield conflicting labels.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank all reviewers for their constructive comments. We are encouraged that all reviewers acknowledged the novelty of our work and the performance improvements over human triage and previous models. R1 and R3 also highlighted the paper’s clinical significance and clear motivation. We will release codes and configuration files upon acceptance for reproducibility; the raw videos cannot be shared due to privacy/IRB, but the revised paper will include pointers to a detailed acquisition protocol. R1

Uncertainty analysis A1. Note that label inconsistencies indicate incorrect clinician triage predictions. Thus, Fig. 3 also demonstrates the clinical relevance of our model by showing the correlation of uncertainty scores and label inconsistencies. Fig. 3 (b) suggests that using 0.4 as the threshold will exclude all incorrect human triage predictions in our dataset. In the revision, we will amend Fig. 3(c) to present the distribution of inconsistencies and model performance across different thresholds, showing better outcomes with a stricter threshold. We recognize the clinical value of using the uncertainty in AI prediction and will leave refinements for future work.

R2

Handling labels A1. AUSTIN leverages the inconsistencies between MRI and triage labels through uncertainty-aware training. Eq. 1 explains the formulation of inconsistencies as an uncertainty loss. We will provide detailed code for reference. Such an uncertainty loss module is designed to be plug-and-play, making it applicable to any clinical setting involving two potentially conflicting supervisory signals.

Dataset A2. Unlike prior work that compares stroke patients with healthy controls (HC)—a more clear-cut task—we focus on mild-stroke vs non-stroke ER patients, a more clinically relevant and challenging problem. Our dataset, collected over five years in a real ER setting, is the largest and most diverse oral-facial multimedia dataset for stroke triage to date. We will continue to expand this into a larger-scale, multi-center cohort dataset, supported by a de-identification pipeline to protect patients’ privacy and enable broader validation.

Baseline comparison A3. We have included ablation studies on recent face analysis SoTA models, such as MARLIN [2] (2023) and FaceXFormer [15] (2024), demonstrating the effectiveness of the proposed framework over well-recognized SoTA models. DeepStroke[1] remains one of the most representative SoTA methods in the oral-facial multimedia stroke diagnosis domain under the ER setting. We have also identified some recent related works. Ou et al. (2025) follows a similar video-audio fusion with contrastive learning, but falls into the easier task of comparing stroke vs. HC. Cai et al. (2024) extends DeepStroke with mobile deployment, but does not substantially change the integration of relevant clinical information. We will include the discussion of these papers in the revision. [Ou et al. Early identification of stroke through deep learning with multi-modal human speech and movement data. Neural Regeneration Research 2025] [Cai et al. M^3Stroke: Multi-Modal Mobile AI for Emergency Triage of Mild to Moderate Acute Strokes. IEEE BHI’24]

R3

Generalizability A1. Besides dropout regularization, AUSTIN adopts transfer learning and adversarial training to prevent overfitting on small datasets. As mentioned in R2 A2, we are actively expanding the dataset to support broader validation.

Representative case analysis A2. Fig. 3 presents an analysis of the Sigma value and MRI-triage label consistency. In the revised version, besides updates mentioned in R1 A1, we will include representative case analysis.

ER Data A3. We follow a realistic ER setting to collect patient videos using a mobile phone, without requiring patients to face the camera in an ideal pose or to speak and perform actions clearly. Experimental results on this challenging dataset show the practical potential of AUSTIN in real-world clinical environments.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

I believe the authors have addressed reviewer concerns in the rebuttal, including clarifications on uncertainty thresholds, dataset scope, and baseline comparisons. While there are some remaining questions about generalizability and reproducibility, because of the method’s demonstrated performance gains and potential clinical relevance I would lean towards accepting the paper.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Enhancing AI-assisted Stroke Emergency Triage with Adaptive Uncertainty Estimation

Author(s):