List of Papers Browse by Subject Areas Author List
Abstract
EEG-based brain-computer interfaces (BCIs) have shown promise in various applications, such as motor imagery and cognitive state monitoring. However, decoding visual representations from EEG signals remains a significant challenge due to their complex and noisy nature. We thus propose a novel 5-stage framework for decoding visual representations from EEG signals: (1) an EEG encoder for concept classification, (2) cross-modal alignment of EEG and text embeddings in CLIP feature space, (3) caption refinement via re-ranking, (4) weighted interpolation of concept and caption embeddings for richer semantics, and (5) image generation using a pre-trained Stable Diffusion model. We en- able context-aware EEG-to-image generation through cross-modal alignment and re-ranking. Experimental results demonstrate that our method generates high-quality images aligned with visual stimuli, outperforming SOTA approaches by 27.08% in Classification Accuracy, 15.21% in Generation Accuracy and reducing Fréchet Inception Distance by 36.61%, indicating superior semantic alignment and image quality. The code is available at https://github.com/CVLABLUMS/CATVis.
Links to Paper and Supplementary Materials
Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/3669_paper.pdf
SharedIt Link: Not yet available
SpringerLink (DOI): Not yet available
Supplementary Material: Not Submitted
Link to the Code Repository
https://github.com/CVLABLUMS/CATVis
Link to the Dataset(s)
EEG visual classification dataset: https://tinyurl.com/eeg-visual-classification
BibTex
@InProceedings{MehTar_CATVis_MICCAI2025,
author = { Mehmood, Tariq and Ahmad, Hamza and Shakeel, Muhammad Haroon and Taj, Murtaza},
title = { { CATVis: Context-Aware Thought Visualization } },
booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
year = {2025},
publisher = {Springer Nature Switzerland},
volume = {LNCS 15960},
month = {September},
page = {100 -- 110}
}
Reviews
Review #1
- Please describe the contribution of the paper
This Manuscript proposes a a multi-step framework for retrieving and reconstructing images from EEG data with the help of generated stimuli-image captions.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The authors thoughtfully articulate their proposed framework’s components, explaining each part’s importance. A particular strength lies in the constructed conditioning prompts, which align with image class labels while integrating fine-grained descriptive information via image captioning. The evaluation includes comparisons with a range of existing EEG-to-image reconstruction frameworks. The manuscript is well-written and organized.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
A key weakness of the paper is that the overall framework closely follows the structure of the existing Brainvis model, with the primary contributions being incremental upgrades such as improved EEG encoding, clip-based fine/coarse-grain conditioning, and the substitution of cascaded diffusion with Stable Diffusion. While valuable, the novelty of the formulation may be limited without precise distinction from prior work. Additionally, the manuscript does not adequately address the role of the caption generation module in the image synthesis pipeline. It remains unclear to what extent the EEG and generated captions influence the final images, suggesting that the EEG signal may serve more as a codebook-style key rather than providing rich generative guidance.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(1) Strong Reject — must be rejected due to major flaws
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The motivation for rejection is that it remains unclear to what extent the EEG and generated captions influence the final images, suggesting that the EEG signal may serve more as a codebook-style key rather than providing rich generative guidance. Lastly, import concerns have been raised about confounding factors within this particular dataset that may affect all findings that use this data. Please refer to Confounds in the Data—Comments on “Decoding Brain Representations by Multimodal Learning of Neural Activity and Visual Features”, Ahmed.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Reject
- [Post rebuttal] Please justify your final decision from above.
This is no-doubt a well written paper. Yet, at a clinical level, the controversy around Palazzo’s dataset remains. The choice of dataset, despite the rebuttal, is unfortunate in my opinion. While the topic is underexplored, additional benchmark datasets do exist (such as Image classification and reconstruction from low-density EEG) and should be included.
Review #2
- Please describe the contribution of the paper
The paper proposes a context-aware EEG image visualization by seralizing multi-modal embeddings of EEG and CLIP embeddings with Stable Diffusion.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is well written and the authors could explain the details of the explored concepts clearly.
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- Limited sample size (specially compared to DreamDiffusion).
- Benchmarks are limited per scenarios, while the readers would benefit more if all of them are compared and further reported in all scenarios.
- More details about the models are necessary to ensure reproducibility.
- In Equation (1), Text$\rightarrow$EEG should be a subscript.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
N/A
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(4) Weak Accept — could be accepted, dependent on rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed method is not extremely novel; however, compared to the existing studies (specially BrainVis that was presented at ICASSP2025), the performance has been improved, which demonstrate the potential of the proposed solution.
- Reviewer confidence
Very confident (4)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
Given the authors’ responses, I believe the paper should be accepted between (binary decision of acceptance or rejection). The paper holds significance to the field (and results are comparable with the SOTA).
Review #3
- Please describe the contribution of the paper
The paper proposes a sophisticated, multi-stage method for generating images from EEG signals. The results are compelling and the application is exciting, addressing an important problem in the context of brain-computer interfaces.
- Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Well designed and well motivated methodology
- A sensible multi-stage approach
- Leveraging frontier AI models such as stable diffusion to address an important clinical problem
- Meaningful quantitative comparisons to competing methods
- Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
- The main weakness with such methods is the assessment for downstream applications and end-users experience
- There are no visual results for competing methods
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.
The submission does not provide sufficient information for reproducibility.
- Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html
Unclear whether the results could be reproduced from the provided details. It would be highly beneficial if the authors were able to provide source code.
- Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.
(5) Accept — should be accepted, independent of rebuttal
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Exciting paper with an interesting application and strong results. While BCI itself may not be a major topic within MICCAI, I could imagine that this paper attracts some attention. The paper itself is very well written, clear, and well motivated. The quantitative results and comparison to other methods are compelling.
What’s unclear is how much the performance of individual stages affect the final result. Surely, stable diffusion will always generate some high-quality images and it is unclear how much the fidelity of intermediate feature representations matter. A very approximate identification of the underlying visual stimulus could be sufficient to obtain a very high-fideltiy output from the last stage. Some of this is addressed in the ablation studies, but remains unclear how it may affect results presented in Fig. 3. It would have been interesting to see visual results from the competing methods.
- Reviewer confidence
Confident but not absolutely certain (3)
- [Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.
Accept
- [Post rebuttal] Please justify your final decision from above.
Intriguing paper with compelling results. I felt that the authors did a good job with the rebuttal, addressing a key concern of the reviewer who recommended ‘strong reject’. I was not concerned by the points raised in that review, and remain positive about the paper.
Author Feedback
We sincerely thank all the reviewers for their thoughtful and constructive feedback. In particular, we appreciate the encouraging comments on the paper’s clarity, motivation and quantitative comparison by R1 and R3. We will revise our manuscript as requested and release our code if accepted to ensure reproducibility (R1,R2,R3).
R1.C1: Qualitative Comparison with SOTA In Fig. 3 of our paper, 15 out of 18 samples we used (i.e. 1st 5 rows) are the same as the ones used in Fig. 6 of BrainVis[6]. We will clarify this in the caption of Fig. 3.
R2.C1: Role of Caption Generation Module Table 6 provides ablation on the influence of concept and caption on generated images, which shows that a measure such as GA, which is based on classification accuracy, increases as the weight of concept increases; however, Beta(10,10) is where we find the balance between the caption and the context. This can also be observed by analysing qualitative results in Fig. 3. For example, in the last two rows, the class of both the samples is “folding chair”; however, the information about the material, i.e., wood, is provided through the caption, i.e., “A wooden folding chair with a green cushion.” For some images, such as “German Shepherd” in row 5, we have also performed generation with linear interpolation between concept and caption features (not shown due to page limit). We observed that caption-only generation resulted in a puppy on green grass, whereas using the proposed Beta(10,10) resulted in correct generation.
R2.C2: Confounds in Data In our work, we have followed the same experimental design as suggested by Palazzo et al. EEG-ChannelNet TPAMI 2021 and BrainVis [6]. The concerns about confounding factors with the EEG/ImageNet dataset (Spampinato et al., Palazzo et al.) raised by Ahmed et al. have already been recently refuted. Please refer to “Rebuttal to ‘Comments on ‘Decoding Brain Representations by Multimodal Learning of Neural Activity and Visual Features’ ”, TPAMI Dec 2024. As a result, no formal retraction or corrigendum has been issued, and both the dataset (Palazzo et al.) and all the papers based on this dataset continue to stand as valid and citable. In fact, Palazzo et al. is one of the highly cited papers with over 70 citations and their dataset is still being considered an important and seminal contribution in the recent literature (Cavazza et al. 2022, Xue et al. 2025).
R2.C3: Novelty of Formulation CATVis reformulates EEG-to-image reconstruction as a novel text-assisted visualisation problem: the pipeline first extracts both the object concept and its contextual description in language space and only then renders an image. The novelty lies in two additional components present in our pipeline. 1) A class-guided caption re-ranking module uses the EEG classifier’s prediction to generate and select the caption that best captures context, turning classification itself into a context-generation step. 2) A stochastic Beta-prior interpolation samples λ from Beta(10,10); varying λ across samples lets the model balance concept and context (see Fig. 3). Moreover, another distinction from BrainVis [6] is using pretrained Stable Diffusion instead of the cascaded version, reducing computational complexity (see sec 3.5).
R3.C1: Limited Sample Size While DreamDiffusion reports a larger overall sample size due to encoder pretraining on ~120k EEG samples collected from multiple publicly available datasets on the MOABB platform, the dataset used for the actual image generation task is the same as ours (Spampinato et al., Palazzo et al.). Therefore, the effective sample size for the generation task remains comparable.
R3.C2: Limited Benchmarks per Scenario We evaluated our approach with 5 prior SOTA methods KD-STFT, Brain2Image, ESG-ADA, DreamDiffusion, and BrainVis (Tables 3–4) covering both GAN and Diffusion paradigms.
R3.C3: Typo in Eq. 1 We thank the reviewer for noticing this oversight. We will correct the notation in the final version.
Meta-Review
Meta-review #1
- Your recommendation
Invite for Rebuttal
- If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.
N/A
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #2
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A
Meta-review #3
- After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.
Accept
- Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’
N/A