Deep Convolutional Cascade
for Face Alignment In The Wild

Arnaud Dapogny & Kevin Bailly & Matthieu Cord

 

Face Alignment is an active computer vision domain, that consists in localizing a number of facial landmarks that vary across datasets. State-of-the-art face alignment methods either consist in end-to-end regression, or in refining the shape in a cascaded manner, starting from an initial guess. In this paper, we introduce DeCaFA, an end-to-end deep convolutional cascade architecture for face alignment. DeCaFA uses fully-convolutional stages to keep full spatial resolution throughout the cascade. Between each cascade stage, DeCaFA uses multiple chained transfer layers with spatial softmax to produce landmark-wise attention maps for each of several landmark alignment tasks. Weighted intermediate supervision, as well as efficient feature fusion between the stages allow to learn to progressively refine the attention maps in an end-to-end manner. We show experimentally that DeCaFA significantly outperforms existing approaches on 300W, CelebA and WFLW databases. In addition, we show that DeCaFA can learn fine alignment with reasonable accuracy from very few images using coarsely annotated data.

Deep Learning for Face Analysis

Arnaud Dapogny & Kevin Bailly & Matthieu Cord

 

Face alignment consists of aligning a shape model on a face image. It is an active domain in computer vision as it

is a preprocessing for a number of face analysis and synthesis applications. Current state-of-the-art methods already perform well on "easy" datasets, with moderate head pose variations, but may not be robust for "in-the-wild" data with poses up to 90. In order to increase robustness to an ensemble of factors of variations (e.g. head pose or occlusions), a given layer (e.g. a regressor or an upstream CNN layer) can be replaced by a Mixture of Experts (MoE) layer that uses an ensemble of experts instead of a single one. The weights of this mixture can be learned as gating functions to jointly learn the experts and the corresponding weights. In this paper, we propose to use tree-structured gates which allows a hierarchical weighting of the experts (Tree-MoE). We investigate the use of Tree-MoE layers in different contexts in the frame of face alignment with cascaded regression, firstly for emphasizing relevant, more specialized feature extractors depending of a high-level semantic information such as head pose (Pose-Tree-MoE), and secondly as an overall more robust regression layer. We perform extensive experiments on several challenging face alignment datasets, demonstrating that our approach outperforms the state-of-the-art methods.

 

The Missing Data Encoder: Cross-Channel Image Completion with Hide-And-Seek Adversarial Network

Arnaud Dapogny & Matthieu Cord & Patrick Perez

 

Image completion is the problem of generating whole images from fragments only. It encompasses inpainting (generating a patch given its surrounding), reverse inpainting/extrapolation (generating the periphery given the central patch) as well as colorization (generating one or several channels given other ones). In this paper, we employ a deep network to perform image completion, with adversarial training as well as perceptual and completion losses, and call it the “missing data encoder” (MDE). We consider several configurations based on how the seed fragments are chosen. We show that training MDE for “random extrapolation and colorization” (MDEREC), i.e. using random channel-independent fragments, allows a better capture of the image semantics and geometry. MDE training makes use of a novel “hide-and-seek” adversarial loss, where the discriminator seeks the original nonmasked regions, while the generator tries to hide them. We validate our models qualitatively and quantitatively on several datasets, showing their interest for image completion, representation learning as well as face occlusion handling.