PhD Defence of Micha Andriluka

Title: "People Detection, Tracking and Pose Estimation in Challenging Real-World Scenes"

Time: Friday, 22nd of October 2010, 14:00

Location: Fraunhofer IGD (Fraunhoferstr. 5), Room 074

 

Abstract:

In this dissertation we consider three challenging and long standing problems in computer vision: people detection, people tracking and articulated pose estimation. Generic solutions to these problems are essential building blocks for understanding images containing people, which is an important and challenging task with numerous applications in automotive safety, robotic navigation, human-computer interaction, automatic image indexing and retrieval.

In order to deal with the large appearance variability of people in uncontrolled environments we propose an approach based on the  pictorial structures paradigm in which we represent the human body as a flexible configuration of rigid body parts and model the appearance of each body part using local image descriptors and discriminative classifiers. We demonstrate the generality of our approach by successfully applying it to various human detection and pose estimation problems.

One of the important goals of our work is to demonstrate the advantages of a tight coupling of people detection, pose estimation and tracking. Tracking of people in uncontrolled conditions is dificult not only due to complex appearance patterns, but also due to frequent full and partial occlusions, which often happen when multiple people are present in the scene. Presence of multiple people also severely complicates data association between frames of the sequence. In order to address this challenge, we propose a tracking-by-detection framework that combines evidence from single-frame detections over several subsequent frames using a dynamical model of body articulations. Finally, while recovery of 3D poses from monocular data appears to be highly ambiguous when considering just single images, many of these ambiguities vanish when we consider 2D evidence accumulated and refined over several image frames. We demonstrate this, by applying our tracking-by-detection framework to the problem of monocular 3D pose estimation of people in uncontrolled street environments.