Abstract

The integration of EEG and eye movements (EM) provides a comprehensive understanding of brain dynamics, yet effectively capturing key information from EEG and EM presents challenges. To overcome these, we propose DTCA, a novel multimodal fusion framework. It encodes EEG and EM data into a latent space, leveraging a multimodal fusion module to learn the facilitative information and dynamic relationships between EEG and EM data. Utilizing cross-attention with pooling computation, DTCA captures the complementary features and aggregates promoted information. Extensive experiments on multiple open datasets show that DTCA outperforms previous state-of-the-art methods: 99.15% on SEED, 99.65% on SEED-IV, and 86.05% on SEED-V datasets. We also visualize confusion matrices and features to demonstrate how DTCA works. Our findings demonstrate that (1) EEG and EM effectively distinguish changes in brain states during tasks such as watching videos. (2) Encoding EEG and EM into a latent space for fusion facilitates learning promoted information and dynamic relationships associated with brain states. (3) DTCA efficiently fuses EEG and EM data to leverage their synergistic effects in understanding the brain’s dynamic processes and classifying brain states.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1793_paper.pdf

SharedIt Link: https://rdcu.be/dV1Mx

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72069-7_14

Supplementary Material: N/A

Link to the Code Repository

N/A

Link to the Dataset(s)

https://bcmi.sjtu.edu.cn/~seed/index.html

BibTex

@InProceedings{Zha_DTCA_MICCAI2024,
        author = { Zhang, Xiaoshan and Shi, Enze and Yu, Sigang and Zhang, Shu},
        title = { { DTCA: Dual-Branch Transformer with Cross-Attention for EEG and Eye Movement Data Fusion } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15002},
        month = {October},
        page = {141 -- 151}
}


Reviews

Review #1

  • Please describe the contribution of the paper
    1. This paper introduces a transformer model for emotion recognition based on the cross-attention mechanism.

    2. A novel model fusion method is proposed for integrating the data features of EEG and EM.

    3. The proposed model achieves state-of-the-art performance in emotion recognition on the public datasets SEED, SEED-IV, and SEED-V.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The model architecture exhibits a high level of innovation, particularly the feature fusion method based on cross-attention.

    2. The comparative experiments and ablation studies are conducted comprehensively, with in-depth discussions on the model itself. This adequately reflects the effectiveness and characteristics of the proposed method.

    3. The logic of the article is clear, and the writing is standardized.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The dataset description is not comprehensive enough, lacking details such as the specific duration of EEG and EM data used as model inputs.

    2. Given that the experiments in the article aim to validate emotion recognition as a task, the narrative of the article should focus more on the emotion recognition task itself rather than brain state classification. Therefore, there is some inconsistency between the experiments and the intent of the article.

    3. For single-modal models, there is no detailed model introduction provided. Specifically, when using only EEG or EM modalities, how the cross-attention mechanism is employed is not explained. If not used, what is the architecture of the model?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not provide sufficient information for reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    No

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    This paper presents a model called DTCA, which integrates EEG and EM information based on a cross-attention mechanism. The model first employs two encoder modules to extract features from EEG and EM signals separately. Then, with the aid of a cross-attention mechanism, the features from both modalities are fused to complement each other. Through validation on publicly available datasets, the model achieves state-of-the-art performance in emotion recognition classification, demonstrating its effectiveness.

    Here are some specific comments:

    1. In Figure 1, why is the gelu activation used in the EEG branch but not in the EM branch?

    2. What kind of embedding is used in the EEG branch during information fusion? Specifically, how does the embedding of the class token contribute to its function? Is it necessary to include it?

    3. The text mentions that “Comparative experiments show that even without temporal self-attention blocks, model performance remains strong.” However, corresponding comparative experiments are not seen in the experimental section. Considering the inherent temporal dependency of EEG signals, it would be more convincing to supplement relevant experiments.

    4. An interesting experimental observation is that on the SEED dataset, if only EEG modality data is used, the classification performance of the proposed method DTCA is not higher than the existing method MEET. However, with the addition of the EM modality, DTCA’s performance improves by 2.47%. Can this partly suggest that the classification performance of DTCA is improved by adding data modalities? Furthermore, adding the EM modality results in a 2.47% improvement on the SEED dataset, a 5.45% improvement on SEED-IV, but a 14.34% improvement on SEED-V. Why is there such a large difference? Is it because there is more data related to the EM modality in the SEED-V dataset? If so, why does DTCA’s classification result using only the EM modality on SEED-V not outperform its results on SEED and SEED-IV?

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method proposed in the paper demonstrates a certain degree of innovation; however, there is room for improvement in its overall quality.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposes a novel multimodal fusion framework called Dual-Branch Transformer with Cross-Attention (DTCA), which aims to effectively capture the essential information from electroencephalography (EEG) and eye movement (EM) data, and learn the deep dynamic relationships between them for brain state classification tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel formulation: The proposed DTCA framework is a novel approach that encodes EEG and EM data into a latent space and utilizes a multimodal fusion module to learn the facilitative information and dynamic relationships between the two modalities.
    2. Original way to use data: The paper leverages the complementary information from EEG and EM data, which is an original way to utilize these multimodal data sources for understanding brain dynamics and classifying brain states.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Lack of comparison with more recent multimodal fusion methods: While the authors compared their approach with previous methods like BDAE and DCCA, there could be more recent multimodal fusion techniques that are not included in the comparison.
    2. Limited dataset diversity: Although the authors conducted extensive experiments on multiple open datasets (SEED, SEED-IV, and SEED-V), the experiments were conducted on three datasets from the same source (BCMI Lab, Shanghai Jiaotong University), which may limit the generalization of the proposed method to other types of EEG and EM data.
    3. Computational complexity: The paper does not provide a detailed analysis of the computational complexity and resource requirements of the proposed DTCA framework, which could be important for real-world applications.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The hyperparameters are well presented, but it would be nice if the code was provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. Please specify what the bold formatting in Table 1 represents. Also, it would be better to use ± for representing standard deviations, and align the decimal places to two digits.
    2. The title only includes the word “method” and lacks the ultimate purpose of the research, such as “for comprehensive understanding of brain dynamics.”
    3. It would be beneficial to visualize the ablation of fusion methods to better illustrate the advantages of the proposed DTCA approach.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The novelty of the proposed DTCA framework and its effective integration of EEG and EM data for brain state classification tasks.
    2. The comprehensive experiments and state-of-the-art performance on multiple open datasets, demonstrating the superiority of the proposed approach.
    3. The well-written paper and clear explanations of the proposed methodology. However, the lack of comparison with more recent multimodal fusion techniques, the limited dataset diversity, and the absence of computational complexity analysis are weaknesses that could be addressed to further strengthen the paper.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper presents a dual-branch transformer-based network for emotion classification using EEG and Eye movement signals.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-organized.
    2. The idea is novel and the results showcase that the model outperforms SOTA.
    3. The findings are clinically interpreted and discussed through ablation studies.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The figures can be improved.
    2. More information is needed on the eye movement part.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    1. The quality of the figures and the text inside the figure needs further improvement. You may use rightarrow instead of “->”, and use 4f_s instead of f_s*4, etc.
    2. More information about the statistical representation of Eye movement signal is needed.
    3. There is a space missing in the line after Eq. 1.
    4. The diagonal values of the confusion matrices should be presented with white color.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper sounds scientifically and clinically and can be accepted upon applying the suggested comments.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

We sincerely thank the reviewers for their time and effort in reviewing our manuscript. We appreciate their valuable comments, constructive suggestions, and recognition of our work. We will carefully revise the manuscript to incorporate all the recommendations from the review panel.

The specific revisions will include the following: enhancing the description of the dataset, improving the presentation of figures and tables, providing a detailed introduction of the unimodal models, including comparisons with more recent multimodal fusion methods where possible, and improving the title of the paper.

All of the aforementioned revisions will be incorporated into our final submission.

Thank you again for your time and effort to improve the quality of our manuscript.




Meta-Review

Meta-review not available, early accepted paper.



back to top