Abstract

A real-time motion capture system is presented which uses input from multiple standard video cameras and inertial measurement units (IMUs). The system is able to track multiple people simultaneously and requires no optical markers, specialized infra-red cameras or foreground/background segmentation, making it applicable to general indoor and outdoor scenarios with dynamic backgrounds and lighting. To overcome limitations of prior video or IMU-only approaches, we propose to use flexible combinations of multiple-view, calibrated video and IMU input along with a pose prior in an online optimization-based framework, which allows the full 6-DoF motion to be recovered including axial rotation of limbs and drift-free global position. A method for sorting and assigning raw input 2D keypoint detections into corresponding subjects is presented which facilitates multi-person tracking and rejection of any bystanders in the scene. The approach is evaluated on data from several indoor and outdoor capture environments with one or more subjects and the trade-off between input sparsity and tracking performance is discussed. State-of-the-art pose estimation performance is obtained on the Total Capture (mutli-view video and IMU) and Human 3.6M (multi-view video) datasets. Finally, a live demonstrator for the approach is presented showing real-time capture, solving and character animation using a light-weight, commodity hardware setup.

Paper

Paper

Supplementary video (short)

Supplementary video (long)




Citation

    @Article{MallesonIJCV2019,
    author="Malleson, Charles and Collomosse, John and Hilton, Adrian",	
    title="Real-Time Multi-person Motion Capture from Multi-view Video and IMUs",
    journal="International Journal of Computer Vision (IJCV)",
    year="2019",
    month="Dec",
    day="17",
    issn="1573-1405",
    doi="10.1007/s11263-019-01270-5",
    url="https://doi.org/10.1007/s11263-019-01270-5"
    }
	    

Data

The Total Capture combined multi-view video, IMU and optical motion capture dataset is available here. The multi-person outdoor datasets used in this paper will be made available for research use upon request.

Acknowledgments

This work was supported by the Innovate UK Total Capture project (grant 102685) and the European Union Horizon 2020 Visual Media project (grant 687800)