Abstract

Learned 3-dimensional face models have emerged as valuable tools for statistically modeling facial variations, facilitating a wide range of applications in computer graphics, computer vision, and medicine. While these models have been extensively developed for adult faces, research on infant face models remains sparse, limited to a few models trained on small datasets, none of which are publicly available. We propose a novel approach to address this gap by developing a large-scale 3D INfant FACE model (INFACE) using a diverse set of face scans. By harnessing uncontrolled and incomplete data, INFACE surpasses previous efforts in both scale and accessibility. Notably, it represents the first publicly available shape model of its kind, facilitating broader adoption and further advancements in the field. We showcase the versatility of our learned infant face model through multiple potential clinical applications, including shape and appearance completion for mesh cleaning and treatment planning, as well as 3D face reconstruction from images captured in uncontrolled environments. By disentangling expression and identity, we further enable the neutralization of facial features — a crucial capability given the unpredictable nature of infant scanning.

Links to Paper and Supplementary Materials

Main Paper (Open Access Version): https://papers.miccai.org/miccai-2024/paper/2260_paper.pdf

SharedIt Link: https://rdcu.be/dV1VO

SpringerLink (DOI): https://doi.org/10.1007/978-3-031-72384-1_21

Supplementary Material: https://papers.miccai.org/miccai-2024/supp/2260_supp.zip

Link to the Code Repository

N/A

Link to the Dataset(s)

N/A

BibTex

@InProceedings{Sch_LargeScale_MICCAI2024,
        author = { Schnabel, Till N. and Lill, Yoriko and Benitez, Benito K. and Nalabothu, Prasad and Metzler, Philipp and Mueller, Andreas A. and Gross, Markus and Gözcü, Baran and Solenthaler, Barbara},
        title = { { Large-Scale 3D Infant Face Model } },
        booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2024},
        year = {2024},
        publisher = {Springer Nature Switzerland},
        volume = {LNCS 15003},
        month = {October},
        page = {217 -- 227}
}


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper compiled a large-scale dataset of uncontrolled and incomplete face scans and employs advanced machine learning techniques including autoencoders to handle the unique challenges of infant face modeling. And, based on this dataset, this paper proposed a a 3D face model for infants, named IFACE. The authors demonstrate the model’s applications in clinical scenarios, such as mesh repair, deformity correction, and 3D reconstruction from monocular images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The focus on infant facial models fills a significant gap in the research, as most existing models target adult faces.

    2. The writing is coherent and technically precise with rich figures illustrations, making it easy to follow.

    3. IFACE’s development could have profound implications in pediatric medical imaging and treatment planning.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Some choices, such as the specific architecture of the autoencoder and loss functions used, lack detailed justifications.

    2. There is little discussion on how the model performs under significantly varied real-world conditions, which is essential for clinical applications.

    3. The manuscript could improve by providing more detailed information on the experimental settings, such as hyperparameters and training details, to aid reproducibility.

    4. Providing a more detailed breakdown of the dataset’s composition in terms of ethnicity and geographic origin would help in understanding the model’s applicability across diverse populations.

    5. Include more extensive comparisons with other non-linear models that are being used in similar contexts to demonstrate the specific advantages of IFACE.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    see the weaknesses above.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Some key technical details should be clarified. The competing method should include more recent models.

  • Reviewer confidence

    Very confident (4)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This is a nice topic of research about creating a large-scale infant’s face shape and appearance model. The proposed method registers a face template on scans manually landmarked and considering trusted masked regions. Then, the modes of shape variations are extracted using autoencoder training which is compared to conventional PCA. The appearance is modeled using separate pix2pix networks. The model is built from 2394 various 3D scans after an intensive data curation and annotation step. A multi-nonlinear representation is also built to allow the model to disentangle the face expression and the age. The models are evaluated in term of reconstruction error.

    Other applications, such as 3D reconstruction from single image and neutralization of facial expression are also tested.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -Developed model opens wide range of potential applications (3D reconstruction from single view, automated infant scan cleaning and completion, homogenization of face expression..) to enable digital twin for surgical planning or simulation. -The topic is of interest since this kind of study is rare (because touching on babies data), but fundamental to make progress

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -The formulation and description of the model built could be improved, the needed steps are little bit difficult to follow (especially for the multi-non linear representation) -It is not clear if disentangling the expression and age are unsupervised or if it needs specific annotations. It would take a mean/method to assess the modes found. -The 3D reconstruction from single view reaches impressive average accuracy of 1.7 mm, whare are the repartition of the errors on the face areas ? It would be great to had an heatmap error here.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The authors claimed to release the source code and/or dataset upon acceptance of the submission.

  • Do you have any additional comments regarding the paper’s reproducibility?

    The developed IFACE model will be made publicly available

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    The paper talks about many topics and could more concise and focused on the model construction details, in this present form, it presents a wide range of applications, but specific expected details are missing.

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Weak Accept — could be accepted, dependent on rebuttal (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presents fundamental advances about how to make relevant infant’s face model with challenging scan acquisition conditions inherent to infant motion and uncontrolled expression.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors introduced a 3D infant face model, IFACE, which enables to disentangle expression and age, an important capability required for more robust face reconstruction application in the clinics. The authors also showed the adult face model, FLAME, did not perform as well as IFACE.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    i. Detailed technical information is provided ii. Clear illustration figures are provided iii. Potential clinical applications are proposed

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    i. Another 3D infant face model, BabyNet https://arxiv.org/abs/2203.05908, was published as a pre-print in 2022. ii. manual segmentation and landmarking could be the key limiting factor for scaling up the training dataset

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Please be aware that providing code and data is a plus, but not a requirement for acceptance.

    The submission does not mention open access to source code or data but provides a clear and detailed description of the algorithm to ensure reproducibility.

  • Do you have any additional comments regarding the paper’s reproducibility?

    i. Suggest to provide the overview of the model architecture, including both the face model (autoencoder) and multi-nonlinear model

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review. Pay specific attention to the different assessment criteria for the different paper categories (MIC, CAI, Clinical Translation of Methodology, Health Equity): https://conferences.miccai.org/2024/en/REVIEWER-GUIDELINES.html

    i. Suggest to provide the data source, the IRB number and medical facility names ii. Performance evaluation was largely restricted to 1-2 illustrative samples, it should be conducted on additional samples

  • Rate the paper on a scale of 1-6, 6 being the strongest (6-4: accept; 3-1: reject). Please use the entire range of the distribution. Spreading the score helps create a distribution for decision-making

    Accept — should be accepted, independent of rebuttal (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite the concept is not totally novel, the authors showed promising performance of the IFACE model and hence the potential clinical significance.

  • Reviewer confidence

    Confident but not absolutely certain (3)

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Author Feedback

First of all, we would like to thank the reviewers of our manuscript for their detailed and valuable feedback. We have carefully considered their comments and will now address the key points.

Data publishing (R4): Unfortunately, patient privacy will not allow us to publish the dataset. However, we will make the model trained on the dataset publicly available to foster more research and hopefully improve clinical infant treatment in the near future.

Model architecture (R1): We have chosen this specific neural network architecture due to its simplicity, which will make it easy to adapt by a wide range of users – even if they don’t have a GPU, PyTorch, CUDA etc. installed, our shallow fully-connected autoencoder can still be downloaded quickly and easily integrated into other libraries. For instance, the few matrix multiplications required to encode and decode an infant mesh could be easily performed with numpy on the CPU. We hope that the published model will further resolve remaining questions about architectural details of our model.

Model and training details (R1, R5): In the camera-ready version of the paper, we will include more details about the architectural choices of our model, hyperparameters, and training details for reproducibility purposes. We also note that the description of the multi-nonlinear model can be confusing. The biggest difference to achieve the expression and age disentanglement lies in the training scheme, which is fully unsupervised and detailed in equations (2), (3), and (4). The scheme uses the fact that the babies were often scanned multiple times on the same day and over multi-month intervals – additional visuals provided in the supplementary video may help to understand the specifics.

Robust model performance evaluation (R1, R4): There are no publicly available 3D datasets of infants to test our model on. However, we have intentionally designed our test set to include diverse high-quality infant data from both of our partner hospitals. All our quantitative results were computed on this test set. Due to privacy constraints, the visual examples shown in the paper are indeed more restricted. Nevertheless, the paper still features three different babies from our dataset exhibiting typical artifacts and two more patients with cleft lip (one of them is only shown in the supplementary video). Additionally, in Figure 6, we have applied our model to the publicly available Infanface dataset, featuring uncontrolled 2D images of infants. We hereby showcase the model’s applicability towards varied real-world conditions to the utmost of our ability.

Dataset diversity and labeling (R1, R4): Unfortunately, our dataset features only a small number of non-Caucasian infants; detailed demographic information was not collected. Automated labeling methods could help streamline the process of diversifying the dataset to eliminate this racial bias in the future.

Error distribution of 3D reconstruction (R5): In Figure 6, we include a heatmap of the 3D reconstruction error from single view, averaged over our test set. We cannot provide any heatmaps for the visual examples, since there is no ground truth 3D mesh associated with these 2D images.

Related infant models (R4): BabyNet is a work that focuses on 3D reconstructions of infant faces from monocular 2D images that are synthesized via BabyFM (“Spectral Correspondence Framework for Building a 3D Baby Face Model” by Morales et al.) – the very first infant face model introduced in 2020. Compared to BabyFM, which was never published, IFACE is built on a dataset that is an order of magnitude larger, covering a vast range of different expressions. We demonstrate that this advancement allows IFACE to reconstruct 3D geometry from real-world monocular 2D images captured in uncontrolled settings, rather than relying on synthesized samples as BabyNet does.




Meta-Review

Meta-review not available, early accepted paper.



back to top