Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Emotion recognition leveraging multimodal data plays a pivotal role in human-computer interaction and clinical applications, such as depression, mania, Parkinson’s Disease, etc. However, existing emotion recognition methods are susceptible to heterogeneous feature representations across modalities. Additionally, complex emotions involve multiple dimensions, which presents challenges for achieving highly trustworthy decisions. To address these challenges, in this paper, we propose a novel multi-expert collaboration and knowledge enhancement network for multimodal emotion recognition. First, we devise a cross-modal fusion module to dynamically aggregate complementary features from EEG and facial expressions through attention-guided. Second, our approach incorporates a feature prototype alignment module to enhance the consistency of multimodal feature representations. Then, we design a prior knowledge enhancement module that injects original dynamic brain networks into feature learning to enhance the feature representation. Finally, we introduce a multi-expert collaborative decision module designed to refine predictions, enhancing the robustness of classification results. Experimental results on the DEAP dataset demonstrate that our proposed method surpasses several state-of-the-art emotion recognition techniques.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/1034_paper.pdf

SharedIt Link: https://rdcu.be/eHwLE

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04927-8_42

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/EEGBrainNet/Emotion-Recognition

Link to the Dataset(s)

DEAP dataset: https://www.eecs.qmul.ac.uk/mmv/datasets/deap/

BibTex

@InProceedings{WanKun_Multiexpert_MICCAI2025,
        author = { Wang, Kun AND Zhao, Junyong AND Zhang, Liying AND Zhu, Qi AND Zhang, Daoqiang},
        title = { { Multi-expert collaboration and knowledge enhancement network for multimodal emotion recognition } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {440 -- 449}
}

Reviews

Review #1

Please describe the contribution of the paper

The authors propose a novel multimodal emotion recognition framework that combine EEG data and facial expressions, processing both through Vision Transformer (ViT) architectures. The framework integrates these modalities through fusion techniques, incorporating dynamic functional connectivity measures to create a unified representation for more accurate emotion prediction.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well-organized and written in clear language, making it a pleasant read overall. The authors combine feature embeddings leveraging both for EEG data and facial expressions, through combination of the SOTA ViT techniques. The research question is an important area of research.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

While multimodal approach to emotion recognition is interesting, it limits the usability of models as both of these modalities are not always possible to acquire or readily available in many datasets.

The manuscript has strong claims are not supported by existing literature or ablation studies. E.g. “existing multimodal emotion recognition methods often yield suboptimal classification performance due to the lack of prior knowledge integration.”

The DEAP dataset consists of 32 subjects, unless if I missed a point training a ViT using a small dataset such as DEAP would result in overfitting to the data. There is also no external validation dataset used to show generalizability.

Missing key literature comparisons that are potentially comparable or exceeding performance by simpler methods: Stajić, Tamara, et al. “Emotion recognition based on deap database physiological signals.” 2021 29th telecommunications forum (TELFOR). IEEE, 2021. Fan, Zhiqiang, et al. “Eeg emotion classification based on graph convolutional network.” Applied Sciences 14.2 (2024): 726. A review paper on unimodal (EEG) emotion recognition approaches, many stating higher performance on the same features: Wang, Jiang, and Mei Wang. “Review of the emotional feature extraction and classification using EEG signals.” Cognitive robotics 1 (2021): 29-40.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Lack of experimentation to support the strong claims made in the manuscript. Missing key citations/comparisons. Possible overfitting, no external validation present.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper
Prior Knowledge Enhancement (PKE) Module Design: A Prior Knowledge Enhancement module is proposed to address the insufficient integration of neurophysiological priors in existing multimodal emotion recognition paradigms. This module augments feature representations by embedding topological associations derived from brain network connectivity, thereby enhancing inter-class feature discriminability and introducing spatiotemporal constraints rooted in neuroanatomical principles.
1. Feature Prototype Alignment (FPA) Module: To resolve cross-modal feature distribution discrepancies, a Feature Prototype Alignment mechanism is developed. The module establishes a correlation metric between multimodal feature vectors, enabling the minimization of latent space divergence while preserving intrinsic modality-specific characteristics through constrained optimization.
2. Experimental Validation: Extensive evaluation on the DEAP benchmark dataset confirms the framework’s efficacy, achieving enhanced recognition performance through systematic integration of neurotopological prior knowledge and cross-modal feature distribution alignment.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Feature Prototype Alignment (FPA): The proposed FPA module is designed to mitigate cross-modal discrepancies by establishing a unified latent representation space. This alignment process generates emotionally consistent feature embeddings across modalities through domain-invariant correlation constraints, thereby reducing redundant feature interactions and optimizing computational efficiency in multimodal fusion.
2. Prior Knowledge Enhancement (PKE): The PKE module incorporates neurophysiological grounding to enhance the domain-reliability of predictive features. By leveraging the inherent objectivity of EEG-derived brain network topology over subjective facial expression variations, this mechanism enforces neurocognitive plausibility constraints on feature representations, ensuring biologically interpretable emotion recognition outcomes.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

While temporal synchronization is implemented between facial expression data and EEG recordings, the affective computing pipeline does not account for inherent subjectivity in facial expression generation. Specifically, the absence of preprocessing procedures for mitigating subjective artifacts (e.g., voluntary expression suppression or cultural display rules) introduces significant inter-subject variability. Furthermore, the methodological framework neglects to control confounding demographic factors that systematically bias facial expression patterns across participants.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
PKE Module: The significance of the PKE module stems from its dual neurocognitive foundations:
1. Biologically Plausible Feature Engineering: By constructing neuroanatomically constrained connectivity profiles that simulate inter-regional communication patterns, the module incorporates neurodynamic interaction patterns into feature representations. This mechanism aligns with established neuroscientific evidence demonstrating that emotional processing emerges from coordinated multi-regional brain network dynamics.
2. Modality Reliability Optimization: Leveraging the inherent neurophysiological objectivity of EEG signals over behaviorally variable facial expressions, the integration of brain network topology introduces domain-consistent neurocognitive priors. This design principle validates the neurocomputational plausibility of feature representations while enhancing model generalizability across heterogeneous subject populations.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The rebuttal was effective, and I have no additional questions.

Review #3

Please describe the contribution of the paper

A multi-modal attention-based fusion of EEG and facial videos for binarized emotion recognition.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper is relatively well-written.
2. The contribution of the paper seems significant.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The major concern is reproduciblity of the results. Hyperparamters of the paper (even if the encoders are borrowed from [3] and [23], the parameters, specially for the decision fusion should be clearly mentioned).
2. In Figure 2, t-SNE method clearly reveals that the separability of the facial video-based features. Does it mean that without using EEG signals, the performance is still high? Such ablation studies can further clarify this point and enlighten the reader about the significance of your work.
3. There are a few methods for dynamic brain graph forming, and the authors decided to use one of the most plain methods, which undermine the casaulity. Please elaborate more on your rationale on using correlation-based graph forming.
4. There are a lot of repeated abbreviation definitions, such as EEG across the paper.
5. In Page 6, authors have mentioned “For facial expression analysis, we employed tools to extract facial features from video frames, ensuring temporal synchronization with corresponding physiological data.” The details of these tools are widely missing.
6. In Table 1, if the methods are brought from studies, their corresponding paper should be cited (in Methods column).
7. The paper needs richer clinical discussions.
8. Also, for mathematical equations, it is better to follow the standard notion (vectors with bold and small letter, matrices with bold and capital letters, etc.)
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not provide sufficient information for reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

As mentioned, the paper seems technically solid and the results are significant; however, the reproducibility should be clarifer and a few points should be elaborated.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The authors addressed to most of my comments.

Author Feedback

We sincerely thank all reviewers for their valuable comments and constructive suggestions. We appreciate the encouraging comments like “well-organized and written in clear language, making it a pleasant read overall” of R1, “confirms the framework’s efficacy” of R2, and “the paper is relatively well-written, the contribution of the paper seems significant” of R3. We will release our code and detailed description of model parameters after acceptance to facilitate reproducibility in the emotion recognition community. [R1, R3] Q1 Modality incompleteness and method scalability: We thank the reviewers for highlighting the practical limitations of incomplete modalities in real-world scenarios (R1). To address this issue, we plan to explore more adaptive fusion strategies in future work to enhance the model’s robustness under modality-missing conditions. In response to concerns regarding potential overfitting due to the limited dataset size (R1) and clinical applicability (R3), we have employed multiple regularization techniques (e.g., dropout) to mitigate overfitting risks. Additionally, we plan to validate our method on larger and more diverse public and clinical datasets in future work. [R2, R3] Q2 Motivation for multimodal data: We sincerely thank the reviewers for their thoughtful feedback on the multimodal dataset. The inherent subjectivity of facial expression data can be influenced by voluntary suppression, cultural display rules, or inter-subject variability (R2). In this study, we introduce EEG signals to provide a more objective physiological perspective on emotional states. The t-SNE visualization in Figure 2 suggests that facial features exhibit class separability (R3). However, due to inter-subject variability and cultural influences, only facial modality may be unreliable for emotion recognition. Therefore, visual separation in t-SNE does not always correlate with quantitative classification performance. To further explore this, we have conducted ablation studies (as shown in Table 1) comparing unimodal settings (facial-only and EEG-only) with the fusion of both modalities. The results demonstrate that while the facial modality performs reasonably well on its own, the inclusion of EEG signals significantly boosts performance. These findings support our motivation for adopting a multimodal framework and reinforce the added value of physiological signals in emotion recognition. In response to data preprocessing concerns (R2), we followed standard preprocessing protocols and will include detailed descriptions after acceptance to ensure clarity. [R1] Q3 Comparison with other methods: We acknowledge the relevance of the works by Stajić et al. (2021), Fan et al. (2024), and the review by Wang (2021), which report competitive performance. We have compared and discussed our method with these studies to demonstrate the effectiveness of our multimodal approach. Due to space limits, we plan to include the detailed comparison results in the code release after acceptance. [R1, R3] Q4 Motivation for dynamic brain graph as priori information: We appreciate the reviewer’s valuable feedback. Our framework incorporates dynamic brain graphs as prior knowledge and ablation studies confirming their effectiveness (R1). Related citations will be included in the final version. Specifically, we construct EEG dynamic graphs using Pearson correlation coefficients, which capture both statistical dependencies between regional activities and linear functional connectivity patterns. This dynamic brain graph as prior enhances the feature representation of emotion-related. We have also conducted an exploration of correlation-based graph learning architectures, which will be reported in subsequent work. [R1, R3] Q5 Writing issues: In our final revision, we will carefully revise the paper and correct the issues with missing references (R1), repeated abbreviation definitions, and standardized the mathematical notation (R3).

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The paper presents a well-motivated multimodal emotion recognition framework with meaningful innovations, particularly in the Feature Prototype Alignment and Prior Knowledge Enhancement modules. While initial concerns were raised about unaddressed subjectivity and demographic confounds, the authors’ rebuttal provided satisfactory clarifications. Overall, I recommend acceptance.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

Multi-expert collaboration and knowledge enhancement network for multimodal emotion recognition

Author(s):