Pose Estimation

Topic header2.jpg

The posture of people and objects reveal a lot of information about their internal state. Humans adopt postures and gestures to non-verbally convey information or intent. Ergonomics teaches us that our posture can indicate possible health issues. Capturing one's pose or posture is a step towards automated understanding of the environment.




Nonverbal communication

Pose estimation is the task of accurately localizing relevant parts of a human body or/and object. There are various sensor types which can be used to estimate the pose, however, we at Gestalt Robotics concentrate on solving this task by merely using imagery. The advantage is that no additional equipment has to be worn. Gestalt Robotics develops tailored pose detection and tracking applications for your business.

Pose estimation identifies the location and orientation of the body’s joints and bones.

Pose estimation identifies the location and orientation of the body’s joints and bones.

skel2 side.jpg
A good stance and posture reflect a proper state of mind.
— Morihei Ueshiba


Under the Hood of Pose Estimation

Various vision-based pose estimation techniques have been proposed in the last 10 years. Among them, the arguably most famous one has been integrated into the Microsoft Kinect for Xbox sensor to intuitively control video game characters.

Bottom-up vs top-down

Under the hood, pose estimation problems could be solved using object detection approaches. However, the parts of a body form a spatial relation. For instance, the hand and the elbow always have the same distance to each other. These constraints are often incorporated into the estimation process. Roughly speaking, there are two ways to obtain the pose: bottom up or top-down.

Bottom up approaches strive to solve the problem by merely localizing the parts, e.g. by object detection. Spatial relationship constraints are then additionally forced. Top down methods, in contrast, put stronger emphasis on the spatial relations by using a predefined model of the target, which encodes these relations, as the starting point.

The model's posture is then adjusted, such that the virtual parts overlap the real ones in the image. Mathematical optimization techniques are the key in this approach. Since both methods have their advantages and disadvantages, successful systems normally combine both approaches.

The fundamental difference between bottom-up…

The fundamental difference between bottom-up…

…and top-down approaches.

…and top-down approaches.

The Necessity of resolving Ambiguities

One major problem with pose estimation tasks is that the 3D objects are represented by a 2D image. Loss of information in the image generation process is inevitable. Distinct poses can result in similar images making the solution to be found ambiguous. With humans, parts of the body can occlude each other requiring us to deal with incomplete information. We have to use prior information, such as considering which postures are more likely in reality, to resolve these issues.

Which of these two arms is facing the camera?

Which of these two arms is facing the camera?



Interaction using gestures

Gestures basically are a sequence of poses. Therefore, pose estimation can also help understanding the gestural language. See Gesture-based Interaction for a detailed description.

Pose estimation of the hand enables nuanced interpretation of gestures.

Pose estimation of the hand enables nuanced interpretation of gestures.


Ergonomic Assistance

Occupational ergonomics is a highly relevant issue. Billions of Euros are spent in the EU alone for health compensation caused by Musculoskeletal disorders (MSDs). Tackling MSDs requires an equally high amount of resources as ergonomists need to manually analyse and redesign every process and workplace. Automating the task of human-posture analysis can massively reduce costs while significantly improving people's health. Such a system requires accurate posture estimation components.


Robotic Grasping

Pose estimation is a mandatory step for a robot actuator to be able to grasp arbitrarily located objects. Often targets have predefined areas, such as the handle of a cup, being suitable for a grasp. In the wild, the handle is not located at a fixed place, but depends on the target's pose. Hence, the first step of grasping a target is determining its pose.

2017-11-22 Mett Story 1c ed.jpg


Pose estimation serves as an essential component for understanding actors and objects in the scene. Localizing each single part enables a nuanced interpretation. Furthermore, we can understand a spatial relation between objects and humans thanks to pose estimation. We at Gestalt provide various methods for pose estimation focusing on vision-based technologies. Get in touch with us to discuss your use case and how it can benefit from pose estimation and added-value function on top. Contact us under info@gestalt-robotics.com or give us a call at +49 30 616 515 60 – we would love hearing from you.