Human Pose Estimation


Human-robot interaction can occur in many forms. One of which is through gestures, for example to point to a particular object or give certain orders. Then, the robot should be able to understand this body language and interact accordingly with the human in a real-world context.

Key to this ability is a suitable model of the human body and its pose. Human pose estimation is an important and challenging problem in computer vision, and many algorithms have been deployed in solving this problem. Most of the current solutions involve standard RGB (or even RGBD) cameras which work under certain conditions. However, fast motions are challenging for these sensors, as the speed an object moves and can be tracked is limited by the sensor output itself (in general around 50-60 Hz). In this context, event-cameras provide a potential solution to the tracking of fast-moving objects, and in particular human poses. This is, among other things, due to the high temporal resolution of event-cameras that provides information in the “blind” time between the frames of a standard camera.

Deep Neural Networks, Spiking Neural Networks.