Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

Emotion recognition from physiological data is crucial for mental health assessment, yet it faces two significant challenges: incomplete multi-modal signals and interference from body movements and artifacts. This paper presents a novel Multi-Masked Querying Network (MMQ-Net) to address these issues by integrating multiple querying mechanisms into a unified framework. Specifically, it uses modality queries to reconstruct missing data from incomplete signals, category queries to focus on emotional state features, and interference queries to separate relevant information from noise. Extensive experiment results demonstrate the superior emotion recognition performance of MMQ-Net compared to existing approaches, particularly under high levels of data incompleteness.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/0029_paper.pdf

SharedIt Link: https://rdcu.be/eHwYI

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-04984-1_33

Supplementary Material: Not Submitted

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{XuGen_MultiMasked_MICCAI2025,
        author = { Xu, Geng-Xin AND Zuo, Xiang AND Li, Ye},
        title = { { Multi-Masked Querying Network for Robust Emotion Recognition from Incomplete Multi-Modal Physiological Signals } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15967},
        month = {September},
        page = {342 -- 352}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper presents a comprehensive framework named MMQ-Net for robust emotion recognition from incomplete multi-modal physiological signals. It uses masked modality queries to reconstruct missing data and masked category and interference queries to separate emotional features from noise. Extensive experiments demonstrate MMQ-Net’s superior performance in emotion recognition, particularly under high levels of data incompleteness, making it a robust solution for emotion recognition task.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This article focuses on a valuable task. In practical scenarios most of the multimodal physiological signals are incomplete. How to utilize these incomplete data for emotion recognition is challenging.
2. Using various query mechanisms, MMQ-Net can effectively complete the task of emotion recognition in the case of incomplete data, and the comparison with the existing methods shows its effectiveness.
3. The structure of the article is clear and the wording is appropriate, so the reader can easily understand the author’s intention.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. Regarding the output Z’s feature splitting mechanism, the current explanation lacks sufficient technical detail. A more thorough breakdown of how the three distinct features are derived would strengthen the methodology section.
2. The feature extraction section appears incomplete as it currently only addresses EEG signal processing. For comprehensive analysis, the feature extraction procedures for other physiological signals commonly used in such studies should also be documented.
3. Several benchmark methods included in the comparison are neither state-of-the-art nor specifically developed for physiological signal analysis, potentially compromising the fairness of the evaluation.
4. The experimental design contains questionable elements that warrant further discussion. Most notably, the uniform missing rate assumption across all modalities appears unrealistic, as real-world scenarios typically exhibit varying missing rates among different modalities. The current approach of applying a single overall missing rate to all modalities seems problematic and requires justification.
5. The results section is too weak—both the analysis and presentation of experimental results require more substantial information.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(2) Reject — should be rejected, independent of rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall, this paper focuses on an interesting task and proposes a potential solution to existing challenges. However, the weak experimental section makes it difficult to validate the effectiveness and generalizability of the proposed method.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Reject
[Post rebuttal] Please justify your final decision from above.

The rebuttal given by the author does not address the concerns mentioned earlier.

Review #2

Please describe the contribution of the paper

This paper proposes MMQ-Net to address two challenges in multi-modal emotion recognition approaches, which are 1) incomplete multi-modal signals and 2) interference caused by body movements and artifacts. To achieve this, MMQ-Net designs a multi-masked querying transformer and multiple queries, where masked modality queries are used to reconstruct missing data, masked category query and interference query are used to separate emotional state features from irrelevant noise. The proposed method is evaluated on two multi-modal physiological datasets, i.e.,DEAP dataset and MAHNOB-HCI dataset, demonstrating the favorable improvements in performance.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

S1: This paper explores the combined challenges in incomplete multi-modal learning, which is of interest to the community. S2: The proposed methodology has good intuition as well as relevance, and the effectiveness is verified through experiments. S3: The paper is well-structured and easy to follow.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

W1: The novelty is somewhat limited in this paper. Specifically, multi-query transformer has been studied in previous papers [R1], and modal-masked transformer [R2] has also been explored in incomplete multi-modal learning. It’s better to highlight the connections and differences with them. W2: This paper does not compare with other query-based networks, such as QuMo [R3]. W3: This paper mentioned interference noise, and proposed interference query to separate emotional state features from it, however, only through the ablation study about the interference reduction loss is limited, it is recommended to carry out visualization to reflect the separation between irrelevant noise features and emotional state features. [R1] Xu, Yangyang, et al. “Multi-Task Learning With Multi-Query Transformer for Dense Prediction”, IEEE TCSVT, 2024. [R2] Shi, Junjie, et al. “MFTrans: Modality-masked fusion transformer for incomplete multi-modality brain tumor segmentation.” IEEE JBHI, 2023. [R3] Chen, Delin, et al. “Query re-training for modality-gnostic incomplete multi-modal brain tumor segmentation.” MICCAI Workshops, 2023.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I think the consideration about combining these two challenges is interesting and the method intuition is straight. However, my major concern is the novelty and whether the theory is true and valid. Much of the proposed method seems borrow from existing papers. Besides, this paper lacks visualization comparison about their performance boost.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

Due to the rules of the conference, additional experiments unfortunately could not be added to increase the persuasiveness of the article, but I think the article is probably near the borderline of acceptance, the methodology of the paper is relatively simple, but logical, and the final acceptance can be more in favor of the final decision of Reviewer #3 (Reviewer #1 seems to be irresponsible in reviewing) and overall quality of the conference’s submissions.

Review #3

Please describe the contribution of the paper

The paper is on an experimental study using deep neural networks for emotion recognition. Two datasets have been included
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Modern deep learning have been applied. This is state of the art.
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

Nothing really new has been done in the paper
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission has provided an anonymized link to the source code, dataset, or any other dependencies.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

Experiments have been done on publicly available data
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

It is a solid conference paper, but with limited novelty
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

N/A
[Post rebuttal] Please justify your final decision from above.

N/A

Author Feedback

We sincerely thank the reviewers for their insightful feedback and constructive suggestions. Reviewer #1 considered our paper to be “a solid conference paper,” and Reviewers #2 and #3 found the combination of two challenges interesting and the intuition behind our method clear. We address the major concerns below.

Comment 1: Novelty is limited. (Reviewer #1 and Reviewer #2)

Response: Thank you for the feedback. We clarify that our approach is not a simple application of existing methods. Recent works like multi-query transformer [R1], modal-masked transformer [R2], and query-based network [R3] extend vision-transformer class tokens and masked attention to tasks like multi-task prediction or medical segmentation, but they neither focus on emotion recognition nor jointly handle missingness and noise.

MMQ-Net’s key novelty is a unified feature-learning paradigm for incomplete multi-modal emotion recognition: it integrates data reconstruction and noise separation in one framework, making it the first method to tackle both missing modalities and interference simultaneously. Given the strong experimental gains we report, we believe this paradigm represents a meaningful advancement. [R1] Xu, Yangyang, et al. “Multi-Task Learning With Multi-Query Transformer for Dense Prediction”, IEEE TCSVT, 2024. [R2] Shi, Junjie, et al. “MFTrans: Modality-masked fusion transformer for incomplete multi-modality brain tumor segmentation.” IEEE JBHI, 2023. [R3] Chen, Delin, et al. “Query re-training for modality-gnostic incomplete multi-modal brain tumor segmentation.” MICCAI Workshops, 2023.

Comment 2: Visualization of noise separation is needed. (Reviewer #2)

Response: We appreciate the suggestion. Our ablation study in Section 3.4 demonstrates the impact of interference reduction on performance. Although we cannot add new visualizations due to page limits, we plan to include them in future work to better show the separation of noise and emotional-state features.

Comment 3: Feature splitting mechanism lacks technical detail. (Reviewer #3)

Response: Thank you for the comment. The latent vector Z is split into three parts: single-modality features, emotional-state features, and interference features, as shown in Equation (4). We will clarify this in the final version.

Comment 4: Feature extraction for non-EEG signals is incomplete. (Reviewer #3)

Response: Section 2.2 covers feature extraction for both EEG and other physiological signals (e.g., ECG, GSR). We will clarify this in the final manuscript.

Comment 5: Evaluation is unfair. (Reviewer #3)

Response: As noted by Reviewers #2 and #3, our work is the first to address this combined challenge. However, we have compared our approach with the latest relevant method, TAE [R4], published in 2024. [R4] Cheng, C., et al. “A novel transformer autoencoder for multi-modal emotion recognition with incomplete data”, Neural Networks, 2024.

Comment 6: Simulation of realistic scenarios is needed. (Reviewer #3)

Response: We appreciate the feedback. Due to page limits, we only show results for two datasets with uniform missing rates. In future work, we will include experiments with varying missing rates across modalities.

Comment 7: Analysis and presentation of experimental results require more substance. (Reviewer #3)

Response: Thank you for the suggestion. We have added percentage gains in Tables 1 and 2 to highlight how much MMQ-Net outperforms the next best baseline. We also include ablation results to show the contributions of each module. Given the conference page limits, we believe our presentation strikes a balance between clarity and conciseness.

We hope these clarifications demonstrate the novelty and value of MMQ-Net. Thank you again for your valuable feedback.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Reject
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The paper received Weak Accept, Weak Reject, and Reject ratings pre-rebuttal. After the rebuttal, one reviewer changed the score from Weak Reject to Accept, one reviewer kept the Reject score, and the third reviewer did not update the final score. After reading the authors’ rebuttal and the reviewers’ comments, the AC noticed that the rebuttal addressed some major concerns by the reviewers, like: the differences of the MMQ-Net relative to prior multi-query and modal-masked transformer works (R1, R2), explanation of more details about feature splitting and feature extraction (R3), and justification of evaluation by comparison with baseline methods (R3). At the same time, several major concerns are not (fully) addressed: explanation about the effectiveness of noise separation (R2), fairness of benchmark comparisons (R3), and the absence of some key technical details (R3). Additionally, the novelty remains an issue as pointed out by R1, who gave a Weak Accept initial score. Considering the paper, the reviewers’ comments, and the authors’ rebuttal, the AC thinks the paper is not acceptable in its current form without a major revision.

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

The model shows good results and the author addresses the concerns in the rebuttal.

back to top

Multi-Masked Querying Network for Robust Emotion Recognition from Incomplete Multi-Modal Physiological Signals

Author(s):

Abstract

Links to Paper and Supplementary Materials

Link to the Code Repository

Link to the Dataset(s)

BibTex

Reviews

Review #1

Review #2

Review #3

Author Feedback

Comment 1: Novelty is limited. (Reviewer #1 and Reviewer #2)

Comment 2: Visualization of noise separation is needed. (Reviewer #2)

Comment 3: Feature splitting mechanism lacks technical detail. (Reviewer #3)

Comment 4: Feature extraction for non-EEG signals is incomplete. (Reviewer #3)

Comment 5: Evaluation is unfair. (Reviewer #3)

Comment 6: Simulation of realistic scenarios is needed. (Reviewer #3)

Comment 7: Analysis and presentation of experimental results require more substance. (Reviewer #3)

Meta-Review

Meta-review #1

Meta-review #2

Meta-review #3