Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Detecting human vigilance states (e.g., natural shifts between alertness and drowsiness) from functional magnetic resonance imaging (fMRI) data can provide novel insight into the whole-brain patterns underlying these critical states. Moreover, as a person’s vigilance levels are closely tied to their behavior and brain activity, vigilance state can strongly influence the results of fMRI studies. Therefore, the ability to annotate fMRI scans with vigilance information can also enable clearer and more robust results in fMRI research. However, well-established vigilance indicators are derived from other modalities such as behavioral responses, electroencephalography (EEG), and pupillometry, which are not typically available in fMRI data collection. While previous works indicate the promise of distinguishing vigilance states from fMRI alone, EEG data can provide reliable vigilance indicators that complement and augment fMRI domain information. Here, we propose CBrain: Cross-modal learning for Brain vigilance detection in resting-state fMRI. Our model transfers EEG vigilance information into an fMRI latent space in training, and predicts human vigilance states using only fMRI data in testing, addressing the need for external vigilance indicators. Experimental results demonstrate CBrain’s ability to predict vigilance states across different individuals at a granularity of 10-fMRI-frames with an 81.07% mF1 score on a test set of unseen subjects. Additionally, our generalization experiments highlight the model’s potential to estimate vigilance in an unseen task and in resting-state fMRI scans collected with a different scanner at a different site. Source code: https://github.com/neurdylab/CBrain.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/4486_paper.pdf

SharedIt Link: https://rdcu.be/eHwKV

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04927-8_11

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/neurdylab/CBrain

Link to the Dataset(s)

N/A

BibTex

@InProceedings{LiCha_CBrain_MICCAI2025,
        author = { Li, Chang AND Li, Yamin AND Pourmotabbed, Haatef AND Zhang, Shengchao AND Salas, Jorge A. AND Goodale, Sarah E. AND Bayrak, Roza G. AND Chang, Catie},
        title = { { CBrain: Cross-Modal Learning for Brain Vigilance Detection in Resting-State fMRI } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {109 -- 119}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes a method for brain vigilance detection using fMRI by incorporating EEG during training to enhance spatio-temporal representation learning. Unlike prior works relying solely on fMRI, the proposed approach leverages the high temporal resolution of EEG to compensate for fMRI’s temporal limitations, ultimately achieving strong performance using only 10-frame fMRI inputs during inference.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The combination of fMRI and EEG leverages the complementary properties of both modalities—fMRI for high spatial resolution and EEG for high temporal resolution—making this approach particularly effective in modeling dynamic brain activity.
- The framework is designed to utilize EEG during training for enhanced learning, but does not require EEG at test time, demonstrating its practical applicability.
- The approach shows competitive performance in brain vigilance detection with limited fMRI input, suggesting improved efficiency over previous fMRI-only methods.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The authors argue that existing fMRI foundation models fail to predict vigilance states. However, recent multi-modal foundation models (e.g., [1]) and advanced fMRI-based models (e.g., [2]) have shown promising zero-shot capabilities. The paper would benefit from a clearer justification and empirical comparison against such models.
- The formulation of positive and negative pairs in the contrastive learning framework is critical, yet the paper lacks sufficient detail on how these are defined and constructed.
- The ground-truth labels are derived from EEG data. The authors should clarify the advantage of EEG-derived annotations compared to fMRI-only labeling strategies and justify the dependency on EEG in the annotation process.
- While the model exhibits decreased performance in external validation due to inter-dataset variability, the paper does not clearly articulate any strategies to address or mitigate this domain gap. It would be beneficial for the authors to discuss potential methods for improving generalizability across datasets.
[1] Saab, Khaled, et al. “Capabilities of gemini models in medicine.” arXiv preprint arXiv:2404.18416 (2024).

[2] Caro, Josue Ortega, et al. “BrainLM: A foundation model for brain activity recordings.” bioRxiv (2023): 2023-09.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the paper addresses a challenging task of detecting vigilance states using fMRI data, I found critical methodological and conceptual weaknesses that raise concerns about the contributions and generalizability of this study. In particular, the manuscript lacks critical comparisons with recent foundation models that have shown promising performance in multi-modal and zero-shot settings. Additionally, several key components, such as the details of contrastive pairs and reliance on EEG-based ground-truth annotations, are under-explained or insufficiently justified. Due to these issues, I decide to recommend rejection.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

While the paper presents a compelling motivation for enabling vigilance detection from fMRI alone and proposes a novel cross-modal architecture (CBrain), its core claim is undermined by a methodological dependency: the model relies on EEG-derived labels during training. This contradicts the central premise of enabling vigilance detection solely from fMRI and raises concerns about the validity of the “fMRI-only” claim. Moreover, the comparison with recent foundation models remains limited, and critical details—such as the construction of contrastive pairs and strategies to bridge the domain gap—are underexplored. These issues collectively weaken both the methodological clarity and the significance of the contribution. A more thorough disentanglement of EEG dependence and clearer comparisons would strengthen the work for future submission.

Review #2

Please describe the contribution of the paper

This manuscript introduces CBrain, a cross-modal learning framework designed to detect vigilance states (e.g., alert vs. drowsy) from resting-state fMRI data with EEG signals used for supervision during training. The core contribution lies in the use of simultaneous EEG-fMRI data to transfer vigilance-related information from EEG to the fMRI latent space via cross-modal contrastive learning. Once trained, the model predicts vigilance states using only fMRI data. CBrain achieves competitive classification accuracy on both held-out test subjects and external datasets acquired from different scanners and paradigms.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The model was trained using paired EEG-fMRI data together with contrastive learning. The procedure brought the fMRI feature representations to be close with EEG features and appeared to improve vigilance prediction accuracy.
- The model operates at a 21s temporal resolution, which is much higher than most existing fMRI-based prediction models with temporal resolution at about 1 minute.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Prediction accuracy: while I truly appreciate the thoroughness of the authors in comparing CBrain with several models, the claim of CBrain outperforming baseline models is somewhat overstated. The CBrain (only fMRI), which if I understand correctly as the main focus of the paper, didn’t perform better than attenMLP. It is somewhat unsurprising that the full CBrain model could outperform all other baselines that made use of only modality. More importantly, both versions of CBrain yielded lower accuracy compared to the meanMLP model, which seriously challenged the claim that CBrain was better than existing baseline models.
- Can the authors include a statistical analysis to indicate if the best performing model for each test set is significantly (in statistical sense) better than other methods?
- Generalization result: I wonder if the “Eye-closed-task” setup can be evaluated on the same baseline models in Table 1. Essentially the first half of Table 4 (eye-closed-rest) is repeated from Table 1. Instead of having a separate Table 4 with its own discussion, it would be much more convenient for readers if the authors can include the results for “Eye-closed-task” setup on all models as another column in Table 1.
- Fig. 2 visualization: currently it’s not possible to compare the ground truth (orange) vs predictions (blue) lines in the bottom plot of each panel. The two lines effectively overlap each other. Furthermore, by model’s prediction by CBrain model, do you mean the full or fMRI-only version? Can the prediction result of the best baseline for each test set be included, too?
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the method deserves merits, the prediction results of the proposed model are not clearly better than baselines
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

My main concerns in the original comments focus on possible improvements to the results reported by the paper. As the key results cannot be updated, I would retain my original decision.

Review #3

Please describe the contribution of the paper

The paper proposes vigilance detection from resting-state fMRI data by transferring EEG vigilance information using Cross-Modal learning during training.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

An interesting application of Multi-modal learning with single-modality prediction.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Evaluation on only one private dataset with improvement in performance. External validation only shows second best performance.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- In table 1, the comparison with existing models, it is not clear if the comparative models only used fMRI or fMRI and EEG?
- Evaluation on external validation shows second best performance with significantly lower mF1 than meanMLP.
- Does not include computational cost of models to compare the relative complexity of the models used.
- In Table 2, it appears that the improvement provided by the proposed EEG encoder is not statistically significant than the second best.
- Similarly for results in Table 4, with eyes-closed-rest and task.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We thank all reviewers for their valuable feedback and support of our method’s novelty, applicability, performance, and efficiency. While adding new results is now prohibited by MICCAI guidelines, we hope our responses can help to address these concerns. Novelty and Experiments (R1:Q1, R2:Q1-Q2, R3-Q1). We propose CBrain, a novel cross-modal architecture that leverages EEG to enhance fMRI-based vigilance detection. Incorporating EEG significantly boosts CBrain’s performance, demonstrating the power of integrating complementary modalities and the potential of cross-modal supervision as an emerging paradigm in this field. In Table 1, comparing full-CBrain with fMRI-only CBrain and fMRI-only trained baselines shows that incorporating EEG data during training improves CBrain’s mF1 by 3.22%, surpassing all baselines (CBrain: 81.07%; attnMLP: 77.96%; meanMLP: 77.08%), supporting our key contribution that EEG knowledge enhances fMRI-based brain state discrimination. Recognizing the generalization potential of meanMLP[1] and attnMLP[1], very recent models, we have included them as baselines. We acknowledge that they outperform CBrain in the external validation set; the reason for this difference merits further investigation. For the comparisons with BrainLM[4], we respectfully direct the reviewers to Table 1. We suspect its performance may benefit from training on sequences longer than 10-fMRI-frame patches. We appreciate the suggestion of Med-Gemini[5], which performs well in ECG-QA tasks. Since it is designed for text generation and is not directly applicable to our non-text setting, we did not include it but agree it would be interesting to explore its adaptations in future work. Generalization(R2-Q2, R3-Q4). We appreciate the encouragement to improve generalization in future work. Currently, Figure 2 shows that CBrain performs well on the external validation set. We propose two solutions for future work: 1) exploring data normalization strategies; normalization methods vary considerably across existing fMRI studies, and domain-specific normalization is a non-trivial factor. 2) Encoding demographic and hardware information to handle inter-subject and inter-site variability. Statistical Reporting, Evaluation, Visualization(R1:Q2-Q4, R2:Q4-Q5). To comply with the rebuttal guidelines, unfortunately we cannot add new results. We agree that including statistical analysis can strengthen our claims and we will augment existing results with statistical tests and summaries (mean, std across subjects). We agree that evaluating fMRI baselines on eyes-closed task scans, visualizing predictions from the best baseline per test set, and merging info from Tables 1,4 would streamline the presentation; we can implement this in the revision. Currently, we visualize predictions of the full CBrain model along with ground truth, and their overlap reflects a high accuracy. Clarifications(R2:Q3, R3:Q2-Q3). CBrain’s theoretical complexity: encoder[2]: O(n^2d), MLP: O(md_1..n), contrastive learning[3]: O(m^2) (n: sequence length, d: feature dim, m: batch size). Training on the full dataset takes less than one hour on a single GPU. In contrastive learning, for feature f, positive samples are features with the same ground truth label as f; negative samples are features with different labels. Regarding the suggested comparison with fMRI-derived ground truth, we note that, while EEG provides gold-standard vigilance indicators, the extraction of vigilance states from fMRI data is currently an open research avenue, and fMRI vigilance labeling strategies are not yet established as ground truth. Here, by training fMRI models to predict EEG-derived labels, we address an open challenge in the field: enabling vigilance detection from fMRI alone, which is our major contribution. [1] Popov et al., PMID: 39515403. [2] Vaswani et al., arXiv:1706.03762. [3] Chen et al., arXiv:2002.05709. [4] Caro et al., bioRxiv:2023.09.12.557460. [5] Saab et al., arXiv:2404.18416.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

Reviewers note the interesting method which learns from multimodal eeg and fmri data but then only requires fmri an inference and improved prediction performance over prior fmri only models. They also noted concerns about limited performance improvement, missing statistical analysis, missing comparisons to experimental comparisons, and missing methods details. The authors addressed many of the clarification requests, and some experiments that were requested were in fact already present in the paper (foundation models). One reviewer brought up concern regarding reliance on eeg for labeling of ground truth and thus validity of the fmri only claim - however, at inference only fmri data is used, and experiments also compared performance with inclusion of different eeg encoders. Given the interesting multimodal learning with single modality inference and potential utility, I recommend acceptance of this work. The authors should please include clarifications and methods details requested (e.g. the definition of positive/negative samples for contrastive learning, requirement of eeg based ground truth labels).

back to top

CBrain: Cross-Modal Learning for Brain Vigilance Detection in Resting-State fMRI

Author(s):