Abstract

In recent years, multimodal emotion recognition has gradually become a research hotspot. Although existing methods have achieved significant results by integrating information from different modalities, irrelevant or conflicting emotional information across modalities often limits performance improvement. Inspired by Mamba’s ability to effectively filter irrelevant information and model long-range dependencies with linear complexity, we propose a new paradigm for EEG-guided adaptive multimodal emotion recognition with Mamba. This paradigm effectively addresses the interference caused by cross-modal information conflicts, enhancing the performance of multimodal emotion recognition. Firstly, to alleviate the interference caused by conflicts between different modalities, we design a multi-scale EEG-guided conflict suppression module. Guided by multi-scale EEG features, this module uses a selective cross state space model to suppress irrelevant information and conflicts in eye movement features, thereby obtaining enhanced eye movement features. Secondly, to deeply integrate the complementary features between the EEG modality and the enhanced eye movement modality, we propose a novel cross-modal fusion mechanism, consisting of Mutual-Cross-Mamba and Merge-Mamba, which effectively captures long-range dependencies in the fused features, thereby enhancing the integration and utilization of cross-modal information. Experimental results on the SEED, SEED-IV, and SEED-V datasets demonstrate that our method significantly surpasses current state-of-the-art methods.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2505_paper.pdf

SharedIt Link: https://rdcu.be/eHwLg

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04927-8_32

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{PinXia_MambaMER_MICCAI2025,
        author = { Ping, Xiangle AND Huang, Wenhui AND Zheng, Yuanjie},
        title = { { MambaMER: Adaptive EEG-Guided Multimodal Emotion Recognition with Mamba } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15960},
        month = {September},
        page = {333 -- 343}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper applies the Mamba model for multimodal emotion recognition and has shown promising experimental results, demonstrating the effectiveness of the Mamba model in handling multi-modal emotional information.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper proposes a relatively systematic framework for using multi-modality for emotion recognition, which can also be extended to other application cases.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. This paper only aims to improve the accuracy of multi-modal emotion recognition, which is fine but not very attractive. And the improvement compared with current methods are actually not significant. It would be better to consider more practical issues with multi-modal emotion recognition in the future.
2. There are some issues with the organization of this paper. For example, how are the mean and std shown in Tables 1-3 calculated has not been described.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper proposes a new Mamba model based multi-modal emotion recognition framework. This framework is shown to be effective, and the organization of this paper is not bad. However, this paper only considers an ideal case when two modalities are all available during the fusion.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #2

Please describe the contribution of the paper

The authors propose a multi-modal emotion recognition pipeline based on the mamba architecture, where eye movement and Electroencephalography (EEG) signals flow through the pipeline, with multi-scale EEG features acting as a guiding signal to resolve cross-modal conflicts that are prevalent in multi-modal emotion recognition architectures. According to the authors, considering irrelevant and conflicting inter-modality information enhances their pipeline’s emotional expression detection capabilities compared to previous works. The authors validate their method using the SEED, SEED-IV, and SEED-V datasets- a suite of “stimulus material induction” data- where they surpass previous studies in multi-modal emotion recognition.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Mathematical Basis: The authors provide a rigorous mathematical explanation of their multi-modal mamba-based emotion recognition pipeline. Modularity: The authors present a comprehensive modular architectural description of the MambaMer model. The model is initialized with low-dimensional token sequences. Then, using a cascading scheme, low, medium, and high-scale EEG and eye features are extracted. They further initialize improved eye movement features that are updated using “selective cross-state-space computation” between these eye movement features and EEG features. The final inter-modal fusion results are obtained from their Cross-Modal Fusion pipeline. State-of-the-art Results: MambaMER captures long-range dependencies, allowing it to beat current methods in multi-modal emotion recognition.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Unconvincing novelty: The authors claim that MambaMER is the first attempt “to apply the Mamba model to resolve cross-modal conflicts in emotion recognition.” Li et al. propose MaTAV, a Mamba-enhanced text-audio-video alignment network that aligns unimodal features for consistent inter-modal handling. Although both MambaMER and MaTAV operate on different data modalities, MambaMER’s authors claim novelty in mitigating cross-modal conflicts, not in modality differences. Furthermore, there is a wealth of prior work that utilizes the Mamba architecture for emotion recognition. Vagueness of claims: The authors frequently use terms such as “emotion-irrelevant,” “irrelevant information,” and “conflicts in the eye movement” without elucidating what represents irrelevant emotion and conflicting eye movements. In equation one, the authors claim the “B” matrix is “‘related’ to the system’s hidden state” and the “C” matrix “is ‘associated’ with the input and output.” However, it is not clear to the reader what “related” and “associated” represent here. In sections 2.2 and 2.3, the authors fail to explain how the low-dimensional token sequences and the enhanced eye movement features are initialized. Furthermore, in section 2.4, the inverse sequence (S_inverse) is said to undergo “further processing” without the authors explaining the further processing scheme. Moreover, the first summation of the loss function ends at “M,” which represents the number of categories. It is not clear what the number of categories is and whether these categories represent the different emotion modalities. Finally, in section 3.4, the authors use the term “deformation model” for the first time without any prior reference or explanation. Missing citations: “Mamba integrates continuous systems with deep learning algorithms by applying zero-order hold (ZOH) to transform the…,” “Additionally, a parallel scanning algorithm is employed to accelerate the equation solving process.”
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I recommend a Weak Accept as the paper presents a potentially valuable contribution by applying the Mamba architecture to cross-modal emotion recognition. While the proposed MambaMER model is promising, the novelty claim is weakened by the existence of related work such as MaTAV, which also leverages Mamba for multi-modal alignment. Additionally, several technical aspects lack clarity, including vague terminology (e.g., “emotion-irrelevant”), undefined variables in key equations, and missing details on feature initialization and processing steps. Despite these issues, the core idea may have merit if the authors can address these weaknesses through a detailed and convincing rebuttal.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Review #3

Please describe the contribution of the paper

This paper proposed a novel multimodal method for emotion recognition using EEG signals and eye movement data. The model has been proven to be efficient and robust on 3 public datasets. The paper introduced a novel way of utilizing the EEG signals during the model learning: using a selective mamba module to encode the signals and insert the embeddings in the eye movement training branch. A novel multimodal fusion strategy was used.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) Novel multimodal fusion method. The cross-modality fusion method helped the model to learn better representations of the two modalities that have conflicts and irrelevant information. 2) Completed evaluation. The authors compared multiple state-of-the-art models across three different datasets. Well-designed ablation study and computation efficiency analysis are provided.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

No clinical feasibility was shown. While the paper presents an interesting approach to sentiment analysis and contributes to the broader field of natural language processing, its direct relevance to the core themes of this medical-focused conference appears somewhat limited.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

1) Could authors explain the way they designed in comparison of EEG dominant and EM dominant? How did you make one of the modality dominant in the training? Thank you 2) Authors mentioned missing modalities issue. Could you please provide more details of what happened and how you overcame it in this work? 3) Minor mistakes: Ref [11] and [12], Ref [14] and [15] are the same paper.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper proposes a solid and technically sound method for sentiment analysis, with well-executed experiments and clear presentation. While the methodology is promising, the lack of a strong connection to clinical or healthcare applications reduces its impact within the context of this venue. This misalignment between the topic and the conference scope was a major factor in determining the overall score.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

N/A

Meta-Review

Meta-review #1

Your recommendation

Provisional Accept
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A

back to top

MambaMER: Adaptive EEG-Guided Multimodal Emotion Recognition with Mamba

Author(s):