Abstract

An objective and accurate emotion diagnostic reference is vital to psychologists, especially when dealing with patients who are difficult to communicate with for pathological reasons. Nevertheless, current systems based on Electroencephalography (EEG) data utilized for sentiment discrimination have some problems, including excessive model complexity, mediocre accuracy, and limited interpretability. Consequently, we propose a novel and effective feature fusion mechanism named Mutual-Cross-Attention (MCA). Combining with a specially customized 3D Convolutional Neural Network (3D-CNN), this purely mathematical mechanism adeptly discovers the complementary relationship between time-domain and frequency-domain features in EEG data. Furthermore, the new designed Channel-PSD-DE 3D feature also contributes to the high performance. The proposed method eventually achieves 99.49% (valence) and 99.30% (arousal) accuracy on DEAP dataset. Our code and data is open-sourced at https://github.com/ztony0712/MCA.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/1767_paper.pdf

SharedIt Link: pending

SpringerLink (DOI): pending

Supplementary Material: N/A

Link to the Code Repository

https://github.com/ztony0712/MCA

Link to the Dataset(s)

https://drive.google.com/drive/folders/1jRQRbRgTIZEDByQYz41CuoyzPe45hxHv

BibTex

@InProceedings{Zha_Feature_MICCAI2024,
        author = { Zhao, Yimin and Gu, Jin},
        title = { { Feature Fusion Based on Mutual-Cross-Attention Mechanism for EEG Emotion Recognition } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15011},
        month = {October},
        page = {pending}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a feature fusion mechanism called Mutual-Cross-Attention (MCA) designed for emotion recognition using Electroencephalography (EEG) data. By integrating MCA with a custom 3D Convolutional Neural Network (3D-CNN), the mechanism achieves 99.49% (valence) and 99.30% (arousal) accuracy on DEAP dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The introduced MCA mechanism uniquely integrates time-domain and frequency-domain features, which are critical for analyzing complex EEG data. 2) The paper proposes a customized 3D Convolutional Neural Network that is used for EEG data analysis.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The paper does not specify how the dataset was divided for training and testing, nor does it detail the classification task specifics, such as whether it involves cross-individual or cross-session analyses. 2) This paper does not provide concrete methods or results to show how the proposed Mutual-Cross-Attention mechanism improves interpretability. 3) The description of the feature fusion mechanism as a “purely mathematical mechanism” is vague. 4) Although the paper mentions a customized 3D Convolutional Neural Network architecture, it fails to detail how this customization specifically caters to the unique characteristics of EEG data.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The paper does not specify how the dataset was divided for training and testing, nor does it detail the classification task specifics, such as whether it involves cross-individual or cross-session analyses.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Please provide details on dataset division and classification specifics, clarify the interpretability offered by the MCA mechanism, elaborate on the “purely mathematical mechanism,” and detail the EEG-specific customizations of the 3D-CNN architecture to strengthen the manuscript.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    1) The lack of detail regarding how the dataset was segmented for training and testing, along with missing specifics on the classification tasks, such as whether they involved cross-individual or cross-session analyses, raises concerns about the study’s validity and reproducibility.
    2) The paper does not adequately demonstrate how the Mutual-Cross-Attention mechanism enhances interpretability, a claim crucial to its novelty.
    3) The term “purely mathematical mechanism” is too vague without a clear, detailed explanation.
    4) The mentioned customization of the 3D-CNN to EEG data is insufficiently detailed, which is vital for assessing the model’s applicability to EEG signals.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have provided clear responses to the concerns I raised previously.



Review #2

  • Please describe the contribution of the paper

    The paper focuses on improving emotion recognition using electroencephalography (EEG) data. It identifies key issues in current approaches, such as excessive model complexity, mediocre accuracy, and limited interpretability. The authors introduce a novel feature fusion mechanism named Mutual-Cross-Attention (MCA) designed to effectively discover the complementary relationship between time-domain and frequency-domain features in EEG data. A customized 3D Convolutional Neural Network (3D-CNN) is used in conjunction with the MCA mechanism. This combination is aimed at optimizing feature fusion and improving the interpretability and performance of emotion recognition systems. The study uses the DEAP dataset, which involves EEG data from 32 subjects who rated 40 one-minute music video clips. The dataset provides a substantial basis for evaluating emotion recognition models in terms of arousal and valence. The proposed method achieves 99.49% (valence) and 99.30% (arousal) accuracy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novelty in Feature Fusion: The introduction of the MCA mechanism is novel in its approach to feature fusion, addressing the common problem of model complexity in EEG-based emotion recognition.

    • High Performance: The proposed method outperforms existing state-of-the-art models with accuracy rates above 99% for both arousal and valence, demonstrating its effectiveness .

    • Enhanced Interpretability: By reducing reliance on complex models and focusing on mathematical fusion, the method improves the interpretability of the emotion recognition process.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper does not explicitly provide details on the dataset splits such as training, validation, and testing sets.

    • The paper does not specify the use of any cross-validation techniques such as k-fold or leave-one-out.

    • Limited Comparison: While the paper compares its results with other state-of-the-art methods, it might benefit from a broader comparison across more diverse EEG datasets and settings.

    The paper does not discuss why other EEG datasets, such as SEED [1] or AMIGOS [2], were not considered. Including or discussing the exclusion of these datasets could provide deeper insights into the model’s versatility and limitations across varied emotional recognition contexts.

    [1] Zheng, W.-L., and Bao-Liang Lu. “Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks.” IEEE Transactions on Autonomous Mental Development 7.3 (2015): 162-175.

    [2] Correa, J. A. M., Abadi, M. K., Sebe, N., & Patras, I. “AMIGOS: A dataset for affect, personality and mood research on individuals and groups.” IEEE Transactions on Affective Computing (2018).

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    It would be beneficial for the readers if the authors could provide more details on:

    • The dataset splits or cross-validation techniques utilized,
    • The training settings or guidelines to replicate the experiments
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    1) Dataset Splits and Validation Details: It is essential to provide explicit information about how the DEAP dataset was divided into training, validation, and testing sets. Clear details on dataset splits help in understanding the model’s ability to generalize and ensure the reproducibility of the results. Please consider adding a subsection in the methodology or experiments section detailing these splits, including the percentage of data used for each phase and the rationale behind these choices.

    2) Cross-Validation Techniques: To enhance the credibility and robustness of the findings, consider including a discussion on cross-validation techniques such as k-fold or leave-one-out. Including a discussion on why a particular form of cross-validation was chosen or not chosen could be beneficial. If cross-validation was indeed used but not mentioned, clarifying this could significantly impact the paper’s reception.

    3) Exploration of Additional EEG Datasets: The paper compares its results with other state-of-the-art methods using the DEAP dataset. To enhance the robustness and generalizability of the proposed model, it would be beneficial to include comparisons across a broader range of EEG datasets. This extension could demonstrate the model’s applicability and adaptability to different emotional recognition contexts and elicitation protocols. Could the authors clarify whether other EEG datasets were considered for this study? If there are other datasets available, such as SEED [1] or AMIGOS [2], which also contain emotional labels, an explanation of why these were not included would be insightful. This inclusion would help readers understand the scope and potential limitations of the dataset choices in relation to the proposed model’s performance. Additionally, incorporating results from these or similar datasets could significantly enhance the paper’s contribution by showing how the model performs under varied conditions and datasets with different characteristics.

    [1] Zheng, W.-L., and Bao-Liang Lu. “Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks.” IEEE Transactions on Autonomous Mental Development 7.3 (2015): 162-175.

    [2] Correa, J. A. M., Abadi, M. K., Sebe, N., & Patras, I. “AMIGOS: A dataset for affect, personality and mood research on individuals and groups.” IEEE Transactions on Affective Computing (2018).

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Reject — could be rejected, dependent on rebuttal (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The high accuracy, novel feature fusion approach, and improvements over existing methods are compelling reasons for a high rating. However, the decision for a “weak reject” is based primarily on several critical gaps in the paper’s methodology and data presentation that could potentially limit its scientific validity and reproducibility:

    • Lack of Detailed Dataset Splits
    • Unclear Cross-Validation Techniques
    • Limited Dataset Exploration: The paper’s restricted use of only the DEAP dataset without considering or discussing other available datasets like SEED or AMIGOS restricts the understanding of the model’s generalizability across different emotional recognition contexts.
  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Reject — should be rejected, independent of rebuttal (2)

  • [Post rebuttal] Please justify your decision
    • Based on the authors’ response and our initial concerns, it is evident that the paper has significant shortcomings that remain unaddressed. The authors confirm the modest size of the DEAP dataset, which limits the statistical representativeness of training subsets. This constraint led them to avoid cross-validation techniques. However, this decision highlights a critical flaw: the insufficient use of cross-validation undermines the robustness of their findings. Furthermore, the authors acknowledge not incorporating other public datasets such as SEED and AMIGOS due to space limitations, which is a substantial oversight. These datasets could have enriched the study, offering more comprehensive validation and enhancing the generalizability and robustness of their proposed method.

    In its current form, the paper is not ready for publication. While the ideas are interesting, they need to be validated correctly to meet publication standards. This reviewer recommends that the paper be rejected at the moment and needs improvement for publication for the following reasons:

    1. Insufficient Dataset Utilization: The paper does not use other available EEG datasets like SEED and AMIGOS, which would have enhanced the method’s robustness and generalizability.
    2. Inadequate Experimental Design: The use of a 70-30 split without cross-validation (e.g., k-fold) weakens result robustness. Lack of validation set details raises concerns about data leakage, overfitting, and effective training stopping.
    3. Limited Hyperparameter Optimization: The paper lacks context on how hyperparameters were optimized. Training for only 12 epochs is insufficient and questionable without a validation set to guide training stopping criteria. The choice of 12 epochs, rather than 16, 32, or more, risks underfitting and overfitting, leading to unreliable model performance. The authors’ choices do not align with standard machine learning protocols, further undermining the study’s validity.



Review #3

  • Please describe the contribution of the paper

    Overall, the paper is well-written, and the authors have introduced a unique Channel-Frequency-Time 3D feature structure. Additionally, they have proposed a novel fusion method, coupled with a customized 3D-CNN.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The novelty of this paper lies in the a unique features fusion method and creating a 3D features matrix from EEG data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper did not share codes.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    Overall, the paper is well-written. However, the following information needs to be added to enhance its quality:

    1) Provide more details about the dataset, including the number of emotional states and the rationale behind selecting specific states.

    2) Please elaborate on the types of electrodes used for data collection (i.e., dry or wet electores) and the placement approach (e.g., 10-10 or 10-20 international system).

    3) Explain the methodology used to remove eye blinking artifacts and clarify whether the dataset includes eye closure or eyes open information. Describe how this baseline data was utilized to remove artifacts.

    4) Provide further details about the 3D-CNN experiment, such as the learning rate, number of epochs, early stopping criteria, etc.

    5) In Table 4, clarify how the 3D dataset was created for Channel-Frequency-DE and Channel-PSD-Time.

    These additions will strengthen the paper and provide a more comprehensive understanding of the research methodology and results.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors have presented a novel approach; however, the paper lacks certain key details regarding the dataset and methodology.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Accept — should be accepted, independent of rebuttal (5)

  • [Post rebuttal] Please justify your decision

    The authors have clearly addressed the concerns I raised earlier.



Review #4

  • Please describe the contribution of the paper

    The paper proposes a feature fusion mechanism combining with a 3D Convolutional Neural Network. It uses complementary relationship between time domain and frequency-domain features in EEG data, and extracts meaningful information.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) Using both spectral and temporal information. 2) Ablation experiments has been performed.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) Why do they claim that their work is state-of-the-art when the authors have not provided a very comprehensive review? 2) Some parts, such as Feature fusion and Classification (including Fig 2 and Fig 3), are not clear and it is difficult to understand how they work.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html
    • How “downsampling” the original 512 Hz data to 128 Hz can improve data quality?
    • Feature fusion must be explained in more detail. It would also be helpful to provide some references.
    • Include other outputs such as likeability, dominance, and familiarity.
  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    1) Emotion recognition based on EEG has been researched for a long time. The topic is interesting and important.

    • Their results are promising.
  • Reviewer confidence

    Somewhat confident (2)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

Appreciate to the reviewers for their meticulous examination and expert guidance on our manuscript. We acknowledge a widely discussed issue, namely the open-sourcing of code. Although we did not provide the code constrained by time, however, we commit to organizing and releasing it on GitHub immediately upon the paper’s acceptance.

Regarding the detailed method of dataset split: We employed the classic and basic 70-30 split (70% training, 30% testing). In fact, we tried use 10-fold cross-validation to evaluate the proposed method, and still achieving best results comparing with other methods which are also evaluated by cross-validation. However, considering the DEAP dataset’s modest size, cross-validation might lead to training subsets too small to represent the overall data’s statistical characteristics, hence this experimental result was not mentioned in the paper. So, we finally adapt the 99.49% accuracy obtained by 70-30 split.

Regarding the specific details of the experiments: We utilized the complete DEAP dataset, the hardware details of which are available on the official website. The dataset includes EEG recordings from 32 subjects, each recorded while watching 1-minute videos, resulting in a total of 40 sessions per subject. Thus, the tasks involve cross-individual and cross-session analyses. In the training phase, the hyperparameters were set as follows: batch size: 32, learning rate: 0.0002, weight decay: 0.0001, and epochs: 12. Additionally, as mentioned in Section 2.5 Classification, the structure of the 3D-CNN was customized to accommodate the shape of the data output from the Mutual-Cross-Attention (MCA).

Regarding the details of MCA: We have not overlooked the detailed description of MCA. Actually, it is a simple yet elegant structure, as mentioned in Section 2.4 Feature Fusion. In brief, MCA initially applies Q=K=PSD, V=DE to the basic attention mechanism formula, equation 4, to derive intermediate feature A, then uses Q=K=DE, V=PSD with the same formula to obtain intermediate feature B. The fused feature is A+B. The entire MCA process is independent of the 3D-CNN and is a mathematical method that does not require learning. The advantage of this method is presented by its ability to allow information to flow in two directions, providing more comprehensive feature interaction and enhanced feature expression, thereby helping to improve the model’s generalizability and reduce bias.

Regarding other datasets: Given the generality already considered with DEAP, our method can be applied to other datasets with some slight adjustments. Moreover, due to the page limitations of conference papers, we could not provide further comparisons with other datasets. Although we cannot add new experiments in this rebuttal according to official guidelines, we have planned for future work. Once we confirm the applicability of MCA on DEAP, we will begin validating its generalizability on other datasets. We appreciate the reviewers’ suggestions!




Meta-Review

Meta-review #1

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    The paper presents an attention mechanism to classify EEG emotion. Although some major weaknesses, such as the training dataset and experimental settings, remain, the reviewers mostly agree that the application is interesting and worth discussing. Considering the paper is better than other papers in the AC’s batch, I recommend acceptance but highly encourage the authors to follow R5’s suggestions for better presentation.

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    The paper presents an attention mechanism to classify EEG emotion. Although some major weaknesses, such as the training dataset and experimental settings, remain, the reviewers mostly agree that the application is interesting and worth discussing. Considering the paper is better than other papers in the AC’s batch, I recommend acceptance but highly encourage the authors to follow R5’s suggestions for better presentation.



Meta-review #2

  • After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

    Accept

  • Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

    N/A

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    N/A



back to top