Abstract

While artificial intelligence (AI) has revolutionized the field of epileptic seizure detection from electroencephalography (EEG), its clinical adoption remains limited, largely due to the lack of transparency in AI models and their inability to explain the underlying seizure etiology. This paper introduces SzXAI, a novel framework to enhance the reasoning abilities of AI models for EEG-based seizure detection. SzXAI employs a contrastive training mechanism, which uses cross-modality similarity layers to align the EEG encodings with textual concept embeddings derived from clinical notes using LLMs. Along with the alignment, SzXAI leverages an attention-weighted pooling mechanism to detect underlying seizure and baseline etiologies. We validate SzXAI via 10-fold cross validation on the publicly available Temple University Hospital dataset. Our results demonstrate that the alignment-powered training mechanism of SzXAI vastly outperforms direct etiology prediction, thus improving the reliability of the predicted seizure etiologies. Furthermore, structured sentence generation using the model output provided insights in a human-readable format. Thus, SzXAI provides an effective platform to boost clinical trust and AI usability in epilepsy management.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4626_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/deeksha-ms/SzXAI

Link to the Dataset(s)

https://isip.piconepress.com/projects/nedc/html/tuh_eeg/

BibTex

@InProceedings{RiaMar_LLMPowered_MICCAI2025,
        author = { Riazi, Maryam and Shama, Deeksha M. and Venkataraman, Archana},
        title = { { LLM-Powered Cross-Modal Alignment for Explainable Seizure Detection from EEG } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15973},
        month = {September},

}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a framework for enhancing the reasoning capabilities of LLMs for EEG seizure identification. It leverages textual representations derived from clinical notes and employs an attention-weighted pooling mechanism to detect both seizure and non-seizure patterns in EEG data.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduced methodology has achieved reasonable performances. The dataset used appears to be of reasonable size, and the methodology is clearly described.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    There are several important concerns that should be addressed to improve the clarity and overall quality of the manuscript:

    • Please consider renaming Section 2 to “Methods” for consistency and clarity.

    • The statement “AI seizure detectors are designed to make a sequence of binary predictions of baseline vs. seizure activity for each short time window (e.g., 1–5 seconds) of the EEG data” requires a supporting reference.

    • There is inconsistency in the description of the time windows used for seizure analysis. The manuscript refers to multiple durations, including 1–5 seconds, 10 minutes, and 1-second epochs. Please clarify what exact window length was used for model training and evaluation.

    • If 1-second EEG segments were used to detect seizures, this choice is highly questionable, as seizure events often exceed this duration. This is particularly concerning given the authors’ own dataset description indicating average seizure lengths approximately 88 times longer than the chosen window. More justification and clarification around this design decision are needed.

    • The dataset description currently located in the experimental results section would be better placed as a separate section preceding the results.

    • Several sentences require revision for grammar and clarity. For example, the sentence: “The subjects can 14.7±25.2 seizures on average lasting an average of 88.0±123.5 sec” should be corrected for English.

    • It is not clear how many experts have annotated the data.

    • The rationale for training the model for 200 epochs is not provided. It is also unclear whether early stopping was used. This information is necessary to assess the robustness of the training process and the risk of overfitting.

    • For the statement “Each experiment requires two GPUs”, please specify the exact machine specs to give context about computational demands.

    • Class distribution is missing! seizure versus non-seizure counts per class are not reported. This information is particularly important, as applying an LLM architecture with extremely high numbers of features to a dataset of (unclear scale) significantly increases the risk of overfitting.

    • Lastly, in the dataset section, please include the age range and clinical conditions of the participants to provide a clearer picture of the study population.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (2) Reject — should be rejected, independent of rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    As per feedback above, particularly concerning the studied window length for seizure detection.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Reject

  • [Post rebuttal] Please justify your final decision from above.

    The justifications in reply to comments do not satisfy technical concerns, particularly related to very small window length for seizures.



Review #2

  • Please describe the contribution of the paper

    This article introduces a framework named SzXAI, designed to enhance both the performance and transparency of artificial intelligence (AI) models in electroencephalogram (EEG)-based seizure detection through cross-modal alignment and interpretability-enhancing techniques. By leveraging a contrastive training mechanism and an attention-weighted pooling mechanism, SzXAI aligns EEG encodings with textual concept embeddings derived from clinical notes. This alignment facilitates the prediction of seizure etiology while simultaneously generating human-readable explanatory reports to improve model interpretability.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    SzXAI aligns EEG dynamics with clinical concepts through cross-modal alignment and contrastive learning, significantly improving both the accuracy and interpretability of seizure etiology prediction. Beyond enhancing AI model performance, the framework further strengthens clinician trust in AI systems by generating human-readable explanatory reports that bridge the gap between algorithmic decisions and medical reasoning.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    1.Lack of Novelty in Core Concept: The central premise of employing LLMs for neural data alignment and anomaly detection is not novel, as prior works have already explored similar cross-modal approaches for biomedical signal interpretation. The manuscript fails to sufficiently differentiate its methodological contributions from existing paradigms.

    2.Limited Clinical Relevance: While the experiments are rigorously conducted on public datasets, the absence of validation on novel clinical data raises questions about the framework’s generalizability to real-world scenarios. The work does not demonstrate tangible advancements in clinical practice beyond benchmark performance.

    3.Marginal Technical Improvements: The reported performance gains (Table 1) are modest, particularly in the ablation studies where individual components (e.g., attention pooling, contrastive loss) exhibit negligible standalone impact. This undermines the claimed innovation of the proposed modules and suggests limited practical utility over simpler baselines.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the research question addressed in this work is undeniably interesting, the methodological contributions lack sufficient novelty, and the clinical applicability remains unconvincing. The experimental results do not demonstrate compelling improvements over existing approaches, raising doubts about the practical impact of this framework. Without more rigorous validation or a clearer advancement beyond the state-of-the-art, the clinical potential of this method appears limited.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper introduces SzXAI, a model‑agnostic module that (i) aligns latent EEG representations with LLM‑derived textual etiology embeddings through a supervised contrastive loss, (ii) pools cross‑modal similarities with seizure‑probability‑driven attention to obtain time‑aware concept scores, and (iii) leverages the same LLM to auto‑construct training labels and to verbalize predictions. This pipeline delivers concept‑level explanations of scalp‑EEG seizures while preserving baseline detector performance

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Align dynamic EEG features with static textual seizure concepts using a contrastive criterion tailored to seizure vs. non‑seizure windows; Ingenious use of LLaMA to mine etiologies from free‑text clinical notes and map them to a compact concept bank, enabling large‑scale weak supervision without manual annotation; 10‑fold CV on 642 TUH recordings; ablation across pooling, loss terms, and two state‑of‑the‑art detectors.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. Etiology labels stem solely from LLM extraction; only spot‑checked. No formal clinician audit or inter‑rater agreement reported, raising concerns about ground‑truth noise.
    2. Experiments limited to adult TUH corpus; no cross‑centre or pediatric validation.
    3. No user study with neurologists comparing SzXAI explanations to saliency maps or concept‑bottleneck baselines; clinical utility therefore speculative.
    4. The class of epileptic EEG database is unbalanced, thus the fold-based validation may not appropriate.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I value the methodological novelty (contrastive alignment + LLM‑driven labels) and the practical plug‑in design that keeps detectors untouched. The empirical gains, though modest, are consistent across architectures and folds. However, the validity of automatically generated labels and the lack of expert‑level interpretability assessment.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We thank the reviewers for their thoughtful feedback and appreciation of our methodological contributions. Below, we respond to the major concerns noted by the reviewers. Minor text edits and missing details will be addressed in the final draft.

  1. Choice of Window Length (Rev 1): Thank you for highlighting the inconsistent instances of window length which will be addressed in the final draft. To clarify, we follow the standard practice by treating seizure detection as a sequence of binary predictions over short EEG windows [Paul, Y. Various epileptic seizure detection techniques using biomedical signals: a review. Brain informatics, 5, 1-19 2018]. Windows within seizure intervals are labeled as seizure, while those outside are labeled as baseline. These predictions are then aggregated across the recording to assess performance. Following established methods (CNNBLSTM [8], DeepSOZ [16]), SzXAI segments each 10-minute EEG into 1-second windows and is trained using precise seizure annotations from the TUH dataset. Subsequently, seizure etiology prediction is performed at the recording level, as it depends on the overall temporal evolution. SzXAI accomplishes this via our novel pooling layer that operates on correlation values for 1-second windows (Fig. 4).

  2. Novelty of SzXAI (Rev 2): SzXAI’s novelty is its ability to generate human-readable explanations of the seizure/baseline etiologies and to map them to salient intervals of the EEG. Our approach lies in stark contrast to prior XAI methods that offer only summaries of results [12] or generic reports [4]. Unlike standard cross-modal methods, our contrastive approach links EEG temporal dynamics with static clinical notes. SzXAI also differs from concept bottleneck models by serving as a flexible, trainable plug-in for any seizure detector. We also introduce an LLM-based concept label generation method to address the lack of labeled datasets. These methodological innovations were appreciated by Revs 1 & 3.

  3. Class Imbalance and Training (Revs 1 & 3): We apologize for the oversight and will include the class distribution in the final draft. Overall, the concept representation ranges from 80 to 350 recordings. We address this imbalance using weighted cross-entropy loss. Rev 1 raised concerns about LLM overfitting. We clarify that the pre-trained LLM was only used to generate labels with no finetuning, so overfitting is not a risk. For SzXAI, we mitigated overfitting by careful monitoring of train-val curves before testing. We chose 200 epochs with early stopping, as all methods converged without the validation loss saturating, ensuring fair comparison across methods. Rev 3 raised a concern with our multi-fold setup. We clarify that we used stratified cross-validation without any patient data leakage across folds to ensure robustness across diverse subsets and prevent overfitting to a single fold. Despite class imbalance, SzXAI has distinct subspaces for all concepts and performs well in all concept groups (Fig 3).

  4. Clinical Validation and TUH Data (Revs 2 & 3): TUH data was recorded in a clinical setting and includes expert-created clinical notes of the seizure timing and etiology per subject. We evaluated our LLM-based label generation against these expert notes and found near-perfect performance (Fig 2), indicating minimal noise. Additionally, two trained graduate students manually verified the concept sets to ensure label quality. While experiments on new multiple clinician-annotated datasets and user studies for ground truth validation would be ideal, TUH is currently the only publicly available EEG dataset with diverse seizure subtypes and detailed clinical notes that offer a viable testbed for our method.

  5. Performance Gains (Rev 2): While precision is only slightly higher than the baselines, SzXAI shows statistically significant (p < 0.05) gains in recall and specificity over most ablations when using both DeepSOZ and CNNBLSTM as the seizure detector (Table 2).




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    N/A

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper presents SzXAI, a cross-modal framework that integrates EEG signals with clinical note-derived embeddings for explainable seizure detection. The reviewers generally appreciated the novel integration of text and EEG signal supervision using attention-based pooling and contrastive learning. R1 and R2 acknowledged the interesting direction and clear presentation but questioned the methodological novelty, the modest performance gains, and the limited clinical validation. However, R1 raised strong concerns regarding methodological clarity, particularly around the use of 1-second windows, clinical soundness, and missing dataset details. I feel that the rebuttal attempted to address several issues, especially the justification for the window length, handling of class imbalance, and reproducibility; but I remain unconvinced that some of these concerns can be resolved without further experiments that are out of scope for the current submission. Furthermore, the lack of expert-grounded clinical evaluation, which the authors tried to address in Point 4, is not satisfactory. Overall, I recommend Reject.



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    While the paper addresses an important problem and introduces an innovative framework (SzXAI) leveraging LLM-powered cross-modal alignment for seizure detection and explainability in EEG, the overall contribution remains incremental for MICCAI standards. The methodological novelty lies in combining contrastive learning with concept-aligned LLM embeddings; however, the performance gains over baselines are marginal and not sufficiently convincing. Key clinical aspects, such as validation on external datasets and expert auditing of LLM-generated concept labels, are lacking. Reviewer 1 highlighted fundamental concerns about the seizure window length and clinical design choices, which were not adequately addressed in the rebuttal. While Reviewer 2 and 3 acknowledged novelty and clarity, they remained cautious about clinical impact and generalizability. Given the mixed reviews and unresolved foundational concerns, the paper does not meet the acceptance threshold.



back to top