Metric analysis

Max Plank researchers propose a metric predictor of face shape called MICA (MetrIC face)

3D reconstruction of human faces is an essential part of several augmented reality (AR) and virtual reality (VR) applications. Most state-of-the-art methods for facial reconstruction from a single RGB image are trained in a self-supervised manner on large 2D image datasets. However, when the face is in a metric context (i.e. when there is a reference object of known size, typical of VR/AR applications), existing solutions cannot reproduce the scale and the correct human face shapes. This happens because, assuming a perspective camera, the scale of a face is ambiguous. Indeed, a large face can be modeled by a huge face far from the camera or by a small face very close to the camera.

Handwritten notes by Luca (Marktechpost Research Staff)

For the aforementioned reasons, a group of researchers from the Max Planck Institute for Intelligent Systems in Tübingen (Germany) proposed to use supervised machine learning to better learn the real shape of human faces. At the same time, since there is no large-scale 3D dataset to accomplish this task, the authors also unified a set of existing small- and medium-scale datasets. This unified dataset contains RGB images and the corresponding 3D reconstructed faces.

As shown in Figure 1, given a single RGB image of a person, MICA (MetrIC face), the method proposed in this paper, generates a head geometry with a neutral expression. In the next part of the article, we will see how this process is carried out.

The upper part of Figure 2 (i.e. the generalized estimation of the metric form) shows how MICA works. To predict the metric shape of a human face in a neutral expression, MICA relies on both 2D and 3D metric data to train a deep neural network. The authors used a state-of-the-art facial recognition network called ArcFace, which is pre-trained on a large-scale 2D image dataset to obtain highly discriminating features for facial recognition. This pre-trained network is robust to changes in facial expression, lighting, and camera. The ArcFace architecture has been extended through a mapping network whose purpose is to map ArcFace entities to a latent space that a geometry decoder can then interpret. Indeed, the last step of the process involves a 3D morphable model (3DMM) called FLAME which generates the geometric shape based on the ArcFace features.

The described network was trained on 2D and 3D data in a supervised way. During the training process, only the last 3 ResNet blocks of the ArcFace network were refined and updated to avoid overfitting and improve the generalization capabilities of MICA.

Finally, in the lower part of Figure 2 is shown an expression tracking process implemented by the authors, based on RGB input sequences and the learned metric reconstruction of the face shape. To implement this face tracker, the authors implemented the Analysis-by-Synthesis principle: given a model reproducing the appearance of a subject, the parameters of the model are updated and corrected so that the synthetic images best fit the input images. This model is initialized with the parameters of the 3DMM component of MICA. Then it is adjusted to learn the deviation of an input image from a neutral pose. In this way, it is possible to obtain a 3D motion tracker.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Towards Metrical Reconstruction of Human Faces'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper, project and github link.

Please Don't Forget To Join Our ML Subreddit


Luca holds a doctorate. student at the Computer Science Department of the University of Milan. His interests are machine learning, data analytics, IoT, mobile programming and indoor positioning. His current research focuses on pervasive computing, context awareness, explainable AI, and human activity recognition in intelligent environments.