Recherche & Développement

La Recherche et le développement de nouvelles technologies constituent l'ADN de Datakalab. Véritable experte en analyse de l'image, l'équipe ne cesse d'optimiser les algorithmes existants et d'en créer des nouveaux. 


Datakalab s'est associé dès le début à Kevin Bailly et Arnaud Dapogny, deux chercheurs de l'Institut des Systèmes Intelligents et de Robotique (ISIR). L'équipe de Datakalab est reconnue mondialement pour ses travaux de recherche en analyse de l'image et soumet régulièrement ses publications de recherche : 

Deep Entwined Learning Head Pose and Face Alignment Inside an Attentional Cascade with Doubly-Conditional fusion

Arnaud Dapogny & Kevin Bailly & Matthieu Cord


Head pose estimation and face alignment consti- tute a backbone preprocessing for many applications relying on face analysis. While both are closely related tasks, they are generally addressed separately, e.g. by deducing the head pose from the landmark locations. In this paper, we propose to entwine face alignment and head pose tasks inside an attentional cascade. This cascade uses a geometry transfer network for integrating heterogeneous annotations to enhance landmark localization accuracy. Furthermore, we propose a doubly-conditional fusion scheme to select relevant feature maps, and regions thereof, based on a current head pose and landmark localization estimate. We empirically show the benefit of entwining head pose and landmark localization objectives inside our architecture, and that the proposed AC-DC model enhances the state-of-the-art accuracy on multiple databases for both face alignment and head pose estimation tasks.

Capture d’écran 2020-11-24 à 10.35.44.
Capture d’écran 2020-11-24 à 10.54.44.

DeeSCo: Deep heterogeneous ensemble with Stochastic Combinatory loss for gaze estimation

Edouard Yvinec & Arnaud Dapogny & Kevin Bailly


From medical research to gaming applications, gaze estimation is becoming a valuable tool. While there exists a number of hardware-based solutions, recent deep learning- based approaches, coupled with the availability of large-scale databases, have allowed to provide a precise gaze estimate using only consumer sensors. However, there remains a number of questions, regarding the problem formulation, architectural choices and learning paradigms for designing gaze estimation systems in order to bridge the gap between geometry-based systems involving specific hardware and approaches using con- sumer sensors only. In this paper, we introduce a deep, end-to- end trainable ensemble of heatmap-based weak predictors for 2D/3D gaze estimation. We show that, through heterogeneous architectural design of these weak predictors, we can improve the decorrelation between the latter predictors to design more robust deep ensemble models. Furthermore, we propose a stochastic combinatory loss that consists in randomly sampling combinations of weak predictors at train time. This allows to train better individual weak predictors, with lower correlation between them. This, in turns, allows to significantly enhance the performance of the deep ensemble. We show that our Deep heterogeneous ensemble with Stochastic Combinatory loss (DeeSCo) outperforms state-of-the-art approaches for 2D/3D gaze estimation on multiple datasets.

Deep Convolutional Cascade for Face Alignment In The Wild

Arnaud Dapogny & Kevin Bailly & Matthieu Cord


Face Alignment is an active computer vision domain, that consists in localizing a number of facial landmarks that vary across datasets. State-of-the-art face alignment methods either consist in end-to-end regression, or in refining the shape in a cascaded manner, starting from an initial guess. In this paper, we introduce DeCaFA, an end-to-end deep convolutional cascade architecture for face alignment. DeCaFA uses fully-convolutional stages to keep full spatial resolution throughout the cascade. Between each cascade stage, DeCaFA uses multiple chained transfer layers with spatial softmax to produce landmark-wise attention maps for each of several landmark alignment tasks. Weighted intermediate supervision, as well as efficient feature fusion between the stages allow to learn to progressively refine the attention maps in an end-to-end manner. We show experimentally that DeCaFA significantly outperforms existing approaches on 300W, CelebA and WFLW databases. In addition, we show that DeCaFA can learn fine alignment with reasonable accuracy from very few images using coarsely annotated data.

Capture d’écran 2019-09-24 à

Deep Learning for
Face Analysis

Arnaud Dapogny & Kevin Bailly & Matthieu Cord


Face alignment consists of aligning a shape model on a face image. It is an active domain in computer vision as it is a preprocessing for a number of face analysis and synthesis applications. Current state-of-the-art methods already perform well on "easy" datasets, with moderate head pose variations, but may not be robust for "in-the-wild" data with poses up to 90. In order to increase robustness to an ensemble of factors of variations (e.g. head pose or occlusions), a given layer (e.g. a regressor or an upstream CNN layer) can be replaced by a Mixture of Experts (MoE) layer that uses an ensemble of experts instead of a single one. The weights of this mixture can be learned as gating functions to jointly learn the experts and the corresponding weights. In this paper, we propose to use tree-structured gates which allows a hierarchical weighting of the experts (Tree-MoE). We investigate the use of Tree-MoE layers in different contexts in the frame of face alignment with cascaded regression, firstly for emphasizing relevant, more specialized feature extractors depending of a high-level semantic information such as head pose (Pose-Tree-MoE), and secondly as an overall more robust regression layer. We perform extensive experiments on several challenging face alignment datasets, demonstrating that our approach outperforms the state-of-the-art methods.

The Missing Data Encoder: Cross-Channel Image Completion with Hide-And-Seek Adversarial Network

Arnaud Dapogny & Matthieu Cord & Patrick Perez


Image completion is the problem of generating whole images from fragments only. It encompasses inpainting (generating a patch given its surrounding), reverse inpainting/extrapolation (generating the periphery given the central patch) as well as colorization (generating one or several channels given other ones). In this paper, we employ a deep network to perform image completion, with adversarial training as well as perceptual and completion losses, and call it the “missing data encoder” (MDE). We consider several configurations based on how the seed fragments are chosen. We show that training MDE for “random extrapolation and colorization” (MDEREC), i.e. using random channel-independent fragments, allows a better capture of the image semantics and geometry. MDE training makes use of a novel “hide-and-seek” adversarial loss, where the discriminator seeks the original nonmasked regions, while the generator tries to hide them. We validate our models qualitatively and quantitatively on several datasets, showing their interest for image completion, representation learning as well as face occlusion handling.