High-detail 3D capture and non-sequential alignment of facial performance

3DIMPVT, 2012

Martin Klaudiny, Adrian Hilton
Centre for Vision, Speech and Signal Processing, University of Surrey, UK
[email protected], [email protected]

Abstract

This paper presents a novel system for the 3D capture of facial performance using standard video and lighting equipment. The mesh of an actor's face is tracked non-sequentially throughout a performance using multi-view image sequences. The minimum spanning tree calculated in expression dissimilarity space defines the traversal of the sequences optimal with respect to error accumulation. A robust patch-based frame-to-frame surface alignment combined with the optimal traversal significantly reduces drift compared to previous sequential techniques. Multi-path temporal fusion resolves inconsistencies between different alignment paths and yields a final mesh sequence which is temporally consistent. The surface tracking framework is coupled with photometric stereo using colour lights which captures metrically correct skin geometry. High-detail UV normal maps corrected for shadow and bias artefacts augment the temporally consistent mesh sequence. Evaluation on challenging performances by several actors demonstrates the acquisition of subtle skin dynamics and minimal drift over long sequences. A quantitative comparison to a state-of-the-art system shows similar quality of temporal alignment.

Keywords

facial performance capture, dense motion capture, non-sequential surface tracking, colour photometric stereo

Materials

Main paper: PDF
Videos: high quality(167MB), low quality(30MB), Actor1 - unregistered mesh sequence(7MB), Actor1 - UV normal map sequence(6MB)

Videos are in AVI format and are encoded by H.264 codec from ffmpeg package.

Acknowledgements

We would like to thank Thabo Beeler at ETH Zurich/Disney Research for releasing their dataset and results (http://graphics.ethz.ch/publications/papers/paperBee11.php).
Also, we appreciate help of Alaleh Rashidnasab as a test actress.
This work was partly supported by EU ICT project SCENE and EPSRC Visual Media Platform Grant.