Abstract

Electroencephalography (EEG) provides a non-invasive window into the brain’s electrical activity, playing an essential role in various brain–computer interface (BCI) and healthcare applications. In this paper, we propose EEG-DINO, a novel foundation model for EEG encoding based on a hierarchical self-distillation framework. By multi-view semantic alignment, the model is able to extract multi-level semantic features from EEG data, which captures a wide range of semantic information, increasing the robustness against noise and variances inherent in complex EEG signals. Moreover, acknowledging the unique heterogeneous spatial-temporal dependencies in EEG signals, we design a channel-aware sampling mechanism and a decoupled positional coding scheme. They independently address spatial and temporal dimensions, enabling the model to capture the intricate structural characteristics of EEG signals. We pre-train EEG-DINO on a large-scale EEG corpus spanning over 9000 hours, which consistently achieves state-of-the-art performance on multiple downstream tasks. These results demonstrate the great effectiveness of our self-distillation framework for EEG encoding.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3347_paper.pdf

SharedIt Link: Not yet available

SpringerLink (DOI): Not yet available

Supplementary Material: Not Submitted

Link to the Code Repository

https://huggingface.co/eegdino/EEG-DINO

Link to the Dataset(s)

N/A

BibTex

@InProceedings{WanXuj_EEGDINO_MICCAI2025,
        author = { Wang, Xujia and Liu, Xuhui and Liu, Xi and Si, Qian and Xu, Zhaoliang and Li, Yang and Zhen, Xiantong},
        title = { { EEG-DINO: Learning EEG Foundation Models via Hierarchical Self-Distillation } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {199 -- 208}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper introduces EEG-DINO, a novel foundation model for EEG encoding that leverages a hierarchical self-distillation framework inspired by DINO-v2. The model is trained on over 9000 hours of EEG recordings and incorporates several tailored components for EEG signals: channel-aware sampling, decoupled spatial-temporal positional embeddings, and multi-view semantic alignment. EEG-DINO achieves state-of-the-art performance on various downstream tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper has several strengths:

    1. EEG-DINO creatively introduces a hierarchical self-distillation strategy into EEG model pretraining, which is novel in the EEG domain.
    2. The introduction of channel-aware sampling and decoupled positional embedding reflects a thoughtful design tailored to the spatial and temporal dependencies inherent in EEG signals.
    3. The model is trained on a large-scale EEG corpus, and demonstrates performance improvements over multiple baseline models in downstream tasks.
  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
    1. The paper lacks explanation for key components. For instance, TFE (Time-Frequency Embedding), one of the core modules in the model, is not described or cited. Readers cannot understand how this module works or why it was used.
    2. While the paper focuses on hierarchical self-distillation, it does not clearly explain the pretraining and fine-tuning pipeline. There is confusion about how the teacher model is obtained, whether via pretraining or another method. Which data are used for teacher vs. student training? How is the model adapted to downstream tasks? These are critical questions that should be explicitly addressed.
    3. TUEV and TUAB, used as downstream datasets, are subsets of the pretraining dataset TUEG. Since class labels are involved during self-distillation, this raises concerns about potential label leakage.
    4. The paper uses different EEG segment lengths across datasets (e.g., 5s for TUEV, 10s for TUAB, 1s for SEED-V), without justifying this design choice. It is unclear how this variation affects fairness in benchmarking against other models.
    5. In Section “Environments and Settings”, the paper states: “All the models are optimized on training set, selected from the validation set and evaluated on the test set”, yet only training and validation splits are described for TUEV and TUAB. It is unclear how the test sets were obtained, affecting reproducibility.
    6. EEG-DINO-L has over 200M parameters, yet the paper does not discuss the training or inference efficiency.
    7. There are concerns about the reliability of reported results. For example, Table 4 shows that most results of baseline models are identical to those in Tables 2 and 13 of the CBraMod paper [1], except for CBraMod itself. The authors should clarify how this occurred and whether fair comparison protocols were strictly followed.
    8. The manuscript contains minor typographical errors, such as the extra character “wo” in the “Baselines & Metrics” subsection of Section 3.1.

    [1] J. Wang et al., “CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding,” Mar. 02, 2025, arXiv: arXiv:2412.07236. doi: 10.48550/arXiv.2412.07236.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (3) Weak Reject — could be rejected, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend a weak reject. While the paper successfully introduces a scalable and effective self-supervised foundation model for EEG, it suffers from several important issues: lack of clarity in the training procedure, missing explanation of core modules, uncertainty around data usage, and potential evaluation inconsistencies. These problems impact the paper’s transparency, reliability, and reproducibility, which are especially critical for foundational work. With clarification and stronger experimental rigor, the work has potential for future acceptance.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    Although this paper still lacks detailed explanations in many aspects, the overall content is interesting and may be considered for acceptance.

    During the rebuttal process, the authors provided clear responses to most of the questions; however, many key points rely heavily on other referenced works. I believe these points should still be discussed and clarified within the paper itself.



Review #2

  • Please describe the contribution of the paper

    Authors propose EEG-DINO, a novel foundation model for EEG encoding based on a hierarchical self-distillation framework. The model is able to extract multi-level semantic features from EEG data, which captures a wide range of semantic information, increasing the robustness against noise and variances inherent in complex EEG signals. The model consistently achieves state-of-the-art performance on multiple downstream tasks. These results demonstrate the great effectiveness of self-distillation framework for EEG encoding.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Extensive experiments: The authors conduct a diverse set of experiments, including baseline comparisons and ablation studies, which provide strong empirical support for the proposed method.

    Strong performance: The experimental results demonstrate clear performance gains, suggesting that the method is both effective and competitive.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    There are several instances of non-standard or unclear expressions throughout the manuscript. For example, in the sentence “Early EEG analysis methods primarily employ deep learning…”, the abbreviation “deep learning (DL)” is not provided upon first mention, which is inconsistent with standard academic writing practices. Similarly, the abbreviation “DPE” only appears later in the manuscript. Abbreviations should be introduced at their first occurrence to avoid confusion. The phrase “Baselines & Metrics wo” is unclear and possibly a typographical or grammatical error. The authors are advised to clarify the intended meaning. The manuscript mentions “12 diverse perspectives (2 global, 2 masked, 8 local)”, but does not provide an explanation of how these specific numbers are determined or justified. More detail is necessary to support this design choice.

    Regarding Figure 1, the component “Channel-Aware Sampling” is missing in the diagram, and its described position does not match the textual description. According to the manuscript, “we first utilize the time-frequency embedding…, Subsequently, we devise a channel-aware sampling mechanism…”, suggesting that Channel-Aware Sampling follows the time-frequency embedding module. However, in Figure 1, it appears before that module. The authors should reconcile this inconsistency and revise the figure accordingly.

    The manuscript refers to “fine-tuning or linear probing”, but does not elaborate on what these terms specifically refer to in the context of their method. A more detailed explanation is necessary to clarify their implementation and role in the experimental setup.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The experimental design is comprehensive, and several parts of the manuscript lack clarity. The manuscript could be further improved to enhance readability and coherence.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    Accept

  • [Post rebuttal] Please justify your final decision from above.

    It is acceptable after the modification of the following details



Review #3

  • Please describe the contribution of the paper

    The authors propose a foundational model named EEG-DINO, pre-trained on a large-scale EEG dataset, which demonstrates superior performance across a range of downstream tasks.

  • Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-organized and easy to follow.

    2. The extensive experiments and ablation studies effectively demonstrate the superior performance of the proposed method.

  • Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

    The proposed model is positioned as a foundation model; however, the current scale may not fully support this claim. The largest variant (EEG-DINO-Large) contains 201M parameters, and the pretraining is conducted on a single EEG dataset. While the reported scaling trends across model sizes are promising, it is important that the foundation model claim to explore training with a larger dataset and scaling the model to the billion-parameter range. Such analysis would provide valuable insights into the emergence of scaling laws, generalization capacity, and robustness — which are often expected characteristics of foundation models.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission has provided an anonymized link to the source code, dataset, or any other dependencies.

  • Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

    N/A

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

    (4) Weak Accept — could be accepted, dependent on rebuttal

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Further experiments with larger model size and various dataset size.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

    N/A

  • [Post rebuttal] Please justify your final decision from above.

    N/A




Author Feedback

We sincerely thank all reviewers for their valuable feedback. Below, we provide detailed responses point by point, which hopefully enhances transparency for reproducibility of our work.

To Reviewer 1: Thank you for acknowledging our extensive experiments and strong performance of our proposed method.

  1. Views crop setting: The crop setting (2 global, 8 local, 2 masked views) is recommended in the original DINO works. The setting also worked best in our experiments.

  2. Fine-tuning and linear-probing: They are two methods to adapt the pretrained model to downstream tasks. Fine-tuning updates all model weights including the backbone and the added classification head. Linear probing updates classification head only, keeping backbone parameters fixed.

  3. Figure 1: We will follow your suggestion and add the title in Figure 1 to make it consistent. In Sec. 2, “Channel-Aware Sampling” should precede “Time-Frequency Embedding” as in Figure 1. We will modify the order.

  4. Textual issues: We will fix the typo (“wo”), add abbreviations (“DL”) and add the full name of “DPE”. Thank you for your careful reviews.

To Reviewer 2: Thank you for highlighting the novelty of our work and performance of our EEG-DINO.

  1. Component explanation: Time-Frequency Embedding (TFE) is directly adopted from CBraMod [20] as mentioned at the first paragraph of Sec. 2. We will also add the citation where it is mentioned in Sec. 2.1 accordingly.

  2. Pre-training, fine-tuning and model adaptation: Pre-training is performed on unlabeled data via self-distillation. Teacher and student models are the defined architectures in DINO and trained concurrently. The pretrained model is adapted via either linear probing or full fine-tuning with labeled downstream datasets (see also Answer 2 to Reviewer 1).

  3. No label leakage: Pre-training with self-distillation is self-supervised training on unlabeled data, which does not involve any class labels and therefore ensures no label leakage.

  4. Segments: We follow the segment lengths for fine-tuning in prior works (CBraMod [20], LaBraM [8], BIOT [21]) to benchmark with their methods.

  5. Splits: TUAB and TUEV provided predefined training/test splits, which are directly adopted in our experiment to ensure reproducibility.

  6. Efficiency: Training EEG-DINO-L is ~60 hours (8 H800 GPUs) and actually efficient. Inference is done in seconds and quite efficient.

  7. Reliability of reported results: To report reliable results, we tried to reproduce the results for compared methods. We reported the numbers from their original papers for methods: LaBraM, BIOT, CNN-Transformer, and ST-Transformer because we can successfully reproduce their results. For CBraMod, we were not able to reproduce the results as reported in their paper, but we found our reproduced results were similar to those reproduced in CBraMod’s GitHub issues. Therefore, we report our own reproduced results for CBramod. Apologies for the confusion and we will clarify this in our paper.

  8. Typo: We will remove the extra “wo”. Thank you!

To Reviewer 3: Thank you for acknowledging our methodological contribution in this work.

Foundation models: Indeed, our trained models exhibit the characteristics of foundation models: our model consistently delivers high performance across three distinct downstream tasks, showing robust generalization. Low standard deviation across random seeds confirms its robustness. Furthermore, performance on these tasks consistently improves as model size increases, providing evidence of scaling behavior.

In our experiments, we observed diminishing returns with more than 201M parameters. We believe this is largely due to the limited dataset size that is insufficient to support training billion-parameter models. The scaling behavior in this case is bottlenecked by dataset size.

We fully agree that training larger models could offer deeper insights into scaling laws, which we plan to explore in future with more EEG data available.




Meta-Review

Meta-review #1

  • Your recommendation

    Invite for Rebuttal

  • If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

    This paper introduces a promising direction in the development of EEG foundation models using a hierarchical self-distillation framework inspired by DINO. The method incorporates EEG-specific design choices (channel-aware sampling, decoupled embeddings) and demonstrates strong empirical performance across multiple datasets. Two reviewers recommend weak accept, highlighting the thoughtful architecture and comprehensive experiments. However, one reviewer raises significant concerns about unclear training procedures, possible data leakage, and missing details in key components such as Time-Frequency Embedding (TFE) and the fine-tuning pipeline. These issues impact transparency and reproducibility, but do not fundamentally undermine the potential of the work.

    Given that the technical direction is valuable and the paper could be improved with clarification, I recommend inviting a rebuttal to address the methodological ambiguities and experimental design concerns.

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A



Meta-review #3

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Reject

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    This paper proposed a EEG-DINO, which implemented a hierarchical self-distillation strategy into EEG model pretraining. The technical where use hierarchical DINO is limited, even it might be the first to apply in the EEG domain. Like Review #2 pointed, it still lacks detailed explanations in many aspects.



back to top