Paper Info Reviews Author Feedback Meta-Review Back to top

List of Papers Browse by Subject Areas Author List

Abstract

3D face reconstruction methods exhibit significant limitations when applied to pathological cases such as facial paralysis, due to inherent challenges including asymmetric motion and non-linear muscle dynamics. To address these gaps, we propose SFPFR, a self-supervised framework for facial paralysis 3D face reconstruction leveraging 1-3 viewpoints. We first propose a self-supervised learning paradigm integrating reconstruction loss, multi-view consistency loss, and a Mamba-based temporal loss to reconstruct 3D face without ground-truth; then, a partitioned dynamic fusion module that adaptively weights multi-view features ensuring precise geometric reconstruction and pathological detail preservation; last, we propose FPD-100, the first multi-view video dataset for facial paralysis, comprising 30,000 frames from 100 patients of 3 views. Extensive experiments validate SFPFR’s superiority, achieving state-of-the-art PSNR (27.74) and FID (37.13). It enables clinical applications in severity assessment, rehabilitation monitoring, and treatment planning, while the dataset and code will be open-sourced to catalyze research in pathological facial analysis.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2025/paper/2771_paper.pdf

SharedIt Link: https://rdcu.be/eHw4x

SpringerLink (DOI): https://doi.org/10.1007/978-3-032-05127-1_47

Supplementary Material: Not Submitted

Link to the Code Repository

https://github.com/SFPFR107/SFPFR

Link to the Dataset(s)

N/A

BibTex

@InProceedings{QiuYar_SFPFR_MICCAI2025,
        author = { Qiu, Yaru AND Wang, Xinru AND Zhang, Jianfang AND Liu, Bo AND Bai, Peng AND Sun, Yuanyuan},
        title = { { SFPFR: Self-supervised Facial Paralysis Face Reconstruction Under Few Views } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025},
        year = {2025},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15969},
        month = {September},
        page = {489 -- 498}
}

Reviews

Review #1

Please describe the contribution of the paper

This paper presents a self-supervised 3D facial reconstruction framework specifically designed for facial paralysis cases. By integrating reconstruction loss, multi-view consistency loss, and a temporal action feature extraction module, the proposed method could achieve high-fidelity reconstruction of asymmetric facial paralysis from single or sparse multi-view inputs.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1The paper proposes FPD-100, large-scale multi-view video dataset comprising 30,000 frames, which serves as a valuable resource for pathological facial analysis. 2This study presents a novel self-supervised learning framework for 3D facial reconstruction of paralyzed patients without ground truth data. The introduction demonstrates good structural organization, with the authors systematically analyzing three critical challenges in this task and explaining why existing methods cannot achieve high-fidelity facial reconstruction for patients with facial paralysis. The clear structure enables readers to fully grasp the research motivation in this paper. The experimental results are also comprehensive, verifying the effectiveness of this method from multiple perspectives
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

1The novelty of this paper is limited, where the overall framework appears to be a simple combination of existing strategies (multi-view consistency loss, dynamic fusion modules, and temporal loss) without novel theoretical or algorithmic advancements. 2The paper emphasizes the failure of existing methods to handle asymmetry and pathological features in the Introduction section, but there are no specific mechanisms proposed to solve these challenges in this paper. 3The experiments rely solely on computer vision metrics (PSNR, SSIM), which are insufficient for clinical applications. For medical conferences like MICCAI, more clinical evaluation experiments should be provided. 4The same content has been written three times in the abstract, contribution, and conclusion.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(3) Weak Reject — could be rejected, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I hope to see the author’s strategies proposed to capture the non-linear muscle dynamics in facial paradox patients, which should be described more detailed.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

The author has addressed my concerns regarding the methodology section. Although there is a lack of clinical validation, I believe the contribution of the methodology section in this article is worthy of acceptance

Review #2

Please describe the contribution of the paper

This paper introduces an effort to build a facial paralysis 3D face reconstruction model. A large dataset was collected and SSL framework was built.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. A novel framework to tackle patient face understanding issue that achieves both SSL with multiple losses for 3D reconstruction and view fusion using dynamic number of views.
2. Real patient video dataset (100 patients) were collected and used to build the model. The framework achieves good reconstruction performance
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.
1. The framework uses FLAME model. The model is parametrized in the way that it is not good at capturing unilateral facial droops or single eye movement. This could significantly limite the real-world utility of the framework.
2. Lacks some key evaluations: temporal stability of video & clinical evaluation of pathological feature preservation
3. Fig.2 results indicate that the model does not accurately capture the facial muscular asymmetry, nor lip characteristics. Fig.3 did not cover the reconstruction results for cases with typical facial paralyses.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

N/A
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a solid research in tackling existing issues in 3D face reconstruction for facial paralysis patients. There has been novel ideas experimented and good performance reported, but concerns arise regarding choice of backbone model, clinical evaluation, and quality of output.
Reviewer confidence

Very confident (4)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

I was originally suggesting a weak accept and would like to maintain my ratings. The author responded to my concerns but appears these were not resolvable in this publication (clinical evaluation is a shared concern). Still, the paper exhibits good novelty and value to the domain.

Review #3

Please describe the contribution of the paper

The paper identifies a problem with existing 3D face reconstruction methods: they often do not perform well on pathological cases (e.g. facial paralysis). Therefore, a multi-view video dataset of 100 patients with facial paralysis is introduced and a 3D facial reconstruction approach is proposed. The reconstruction process includes a self-supervised framework (integrating reconstruction, view consistency, and temporal action components), as well as a dynamic fusion module. The authors promise to release both the dataset and the code.
Please list the major strengths of the paper: you should highlight a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Both the code and dataset will be released which is very helpful for reproducing results and for facilitating further research. The dataset itself is a highly valuable contribution.

The paper addresses an important problem for 3D face-based medical applications. Very often, faces are, in some way or another, abnormal due to a medical condition. Many non-medical approaches and facial models are built using normative facial data and do not perform well on faces with significant dysmorphism. While dedicated 3D scanning systems can capture abnormal features, they are expensive, and many downstream methods for processing 3D scans still rely on normative 3D facial models. 3D reconstruction methods that work well on abnormal faces with sparse multi-view images or videos will allow 3D face-based diagnostic methods to be applied much more broadly. I appreciate the effort spent to address these problems.

The qualitative results (Figure 2) look very good. The quantitative results show modest improvements over previous work.

While the approach is similar to the Smirk approach, the methods have been extended to support multiple input views (fusion module), and to work with videos (temporal feature module).
Please list the major weaknesses of the paper. Please provide details: for instance, if you state that a formulation, way of using data, demonstration of clinical feasibility, or application is not novel, then you must provide specific references to prior work.

In general, I found the explanation of the method lacking in some places. I would struggle to replicate the approach based on the description in the paper alone. I frequently had to reference other papers to understand this paper. It would have been nice if it was more self-contained. Here are some questions that I had during/after reading the paper:

In the description of the Pre-reconstruction Module: What does it mean that the FLAME algorithm is applied to extract facial parameters? Is a 3DMM being built here or is a pre-trained FLAME blend-shape model being used? What facial parameters are extracted in this step?

From the FPD-100 dataset, 2k frames were reserved for testing. How were these frames selected? Were there subjects whose images are included in both the train and test data? What about the temporal aspects of the split?

Additionally:

The augmentation through expression modification is mentioned but not clearly explained. The action transformation loss is mentioned but it was not clearly explained. The Mamba loss is mentioned but not clearly explained.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

The authors claimed to release the source code and/or dataset upon acceptance of the submission.
Optional: If you have any additional comments to share with the authors, please provide them here. Please also refer to our Reviewer’s guide on what makes a good review and pay specific attention to the different assessment criteria for the different paper categories: https://conferences.miccai.org/2025/en/REVIEWER-GUIDELINES.html

I would consider adding LPIPS as an image similarity metric https://richzhang.github.io/PerceptualSimilarity/

Genetic syndrome diagnosis using 3D facial features is another domain that could benefit from these methods and has similar challenges. I would consider including syndromic data in future studies.

In Fig 2. The meshes are rendered from different angles. It would be better if the viewpoint was consistent across the different approaches, and matched the image shown in the top row.
Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making.

(4) Weak Accept — could be accepted, dependent on rebuttal
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall, the paper makes valuable contributions. The dataset and codebase release is greatly appreciated. The method is an interesting extension of the Smirk approach. The results look good.

I found the explanation of the method confusing and insufficient at times. Improving the writing so that the description of the method is more self-contained could make this a much stronger paper.
Reviewer confidence

Confident but not absolutely certain (3)
[Post rebuttal] After reading the authors’ rebuttal, please state your final opinion of the paper.

Accept
[Post rebuttal] Please justify your final decision from above.

My main concern with this paper was the quality of the explanation for certain aspects of their method. The authors have agreed to clarify the aspects of the paper that I found under-explained. Given that the authors have also agreed to release the dataset and code, I believe this work will be a valuable contribution.

Author Feedback

R1 Q1 - Novelty and Framework Combination: Thank you for your feedback. Driven by the urgent clinical need for accurate facial paralysis reconstruction, SFPFR introduces a novel self-supervised framework tailored for this challenge. While building on existing techniques, its innovation lies in seamlessly integrating adaptive 1-3 view processing, a Partitioned Dynamic Fusion Module with dynamic weighting for asymmetric features, and pioneering Mamba-based temporal modeling to capture complex facial movements. Unlike prior methods like Smirk, which struggle with unilateral deficits, SFPFR achieves superior reconstruction of features like cheek drooping and mouth deviation. We will revise the introduction and methods to clearly highlight these contributions. R1 Q2 - Asymmetry Mechanisms: Thank you for your valuable feedback. To address the challenge of capturing nonlinear muscle dynamics and asymmetry in facial paralysis, SFPFR employs a dynamic weighting within the Partitioned Dynamic Fusion Module that adaptively fuses three view models, using a spatially varying weight function to prioritize lateral views for cheek asymmetry while retaining frontal precision for eye regions; this is complemented by a multi-view consistency loss that enforces geometric alignment across rendered views to ensure accurate reconstruction of asymmetric features like mouth deviation. Additionally, the Mamba-based Temporal Action Feature Capture Module allows the model to preserve non-smooth motion transitions in pathological sequences (such as sudden muscle jerks) while maintaining consistency in smoother movements, thus modeling the irregular and non-proportional muscle behaviors characteristic of facial paralysis. We will expand Section 2 to provide more detailed technical descriptions of these strategies and their roles in addressing asymmetry and nonlinear dynamics. R1 Q3 - Lack of Clinical Validation: As a Medical Image Computing paper, this study focuses on algorithm development, but clinicians have participated in and guided the study’s motivation, data collection, and result analysis. We plan to validate in the future by correlating results with physician evaluation methods such as House-Brackmann and Sunnybrook facial nerve grading systems.

R2 Q1 - FLAME Model Limitations: We acknowledge the reviewers’ point that FLAME, as a generic face model, lacks consideration of asymmetric data, limiting unilateral facial modeling. To address this, we designed modules for multi-view weighting (PDFM), synthetic multi-view consistency (VMM), and nonlinear motion dynamics (temporal module with Mamba). For future work, developing a detailed, flexible face model tailored to asymmetric structures is feasible. R2 Q2 - Missing Evaluations: Owing to space limits, we plan to evaluate the video reconstruction stability highlighted by reviewers in our future video face reconstruction work, and conduct a systematic clinical evaluation of pathological feature preservation. R2 Q3 - Visual Results: In Fig.2, our method demonstrates superiority in reconstructing mouth asymmetry compared to single and multi-view approaches, yet there is still room for further improvement. Fig.3 only shows selected authorized faces due to patient privacy.

R3 Q1 - Pre-reconstruction Module Clarity: To clarify, FLAME’s pre-trained encoder extracts shape (α), expression (φ), and pose (θ) parameters, with a focus on eyelid, lower face, and jaw parameters relevant to facial paralysis. We use the official pre-trained model and fine-tune the expression encoder in Stage 1 to adapt to pathological motion patterns. R3 Q2 - Dataset Split Details: The test set comprises 2,000 images from 10 independent subjects, which are distinct from the training set. With one frame extracted per second, it includes diverse severity levels of facial paralysis. R3 Q3 - Clearer Explanations: We will revise the sections to clarify the augmentation methods and losses with more details as per feedback.

Meta-Review

Meta-review #1

Your recommendation

Invite for Rebuttal
If your recommendation is “Provisional Reject”, then summarize the factors that went into this decision. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. You do not need to provide a justification for a recommendation of “Provisional Accept” or “Invite for Rebuttal”.

N/A
After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #2

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

Meta-review #3

After you have reviewed the rebuttal and updated reviews, please provide your recommendation based on all reviews and the authors’ rebuttal.

Accept
Please justify your recommendation. You may optionally write justifications for ‘accepts’, but are expected to write a justification for ‘rejects’

N/A

back to top

SFPFR: Self-supervised Facial Paralysis Face Reconstruction Under Few Views

Author(s):