Object Detection

Topic header.jpg

Imagine the following situation: you are standing in a room and want to find out what has happened: You spot some cups, dishes and a refrigerator. This makes you come to the conclusion that you are in the kitchen. A group of empty bottles and bags indicate that there has recently been a party. Just in that moment, a person comes in wearing a tie. Probably this is the kitchen of a working space, which has hosted this event.




What, where and how big?

Being able to interpret our environment is natural for us human beings. We are exceptionally good at recognizing objects even when their appearance varies. The task of object detection encompasses identifying, localizing and estimating the spatial dimension of objects, animals or people in the scene. It is a vital skill to be able to act autonomously.

We at Gestalt Robotics develop and customize object detection systems from video data. Typically, algorithms we use can distinguish between 80+ different object types including people, bottles, cups, refrigerators or cars. Unknown objects can also be learned from new data. The processing of an image can be optimized to be close to real-time.

2018-09 HMI Demo Industry 2.jpg

The combination of Semantic Segmentation and Asset Detection is the core service of our AI-Powered Environment Understanding for Autonomous Vehicle Systems in Industrial Settings.



The era before Deep Learning

In recent years, deep learning techniques have revolutionized the way people build intelligent systems. The methods developed have beaten traditional systems by a significant margin. Although highly performant systems have just appeared in the last five years, the idea behind them has been existing for decades. The whole story started with the problem of image classification: categorizing an image into previously known classes. The traditional methods basically worked the following way: First, a set of hand-crafted features are extracted from the image. Examples are edge or circle detectors or color statistics. Afterwards, Machine Learning algorithms are applied to identify the class. The problem is that designing the right features felt like black magic. Why is a particular type of feature the best representation for the data I have and why doesn't this type work on a slightly different dataset? Often, we had to make (educated) guesses.

Learning feature hierarchies

Enter Deep Learning: Instead of relying on heuristics to design the features, we learn them from the image data. Therefore, optimal features for the given dataset are automatically derived. The methods are inspired by the way the human brain works: A set of image transformations is subsequently applied on the input image, each extracting specific features. The features form a hierarchy where, for example, earlier transformations extract edges and curves in an image containing a face. The features created later look for particular facial attributes from the detected edges, such as eyes or the mouth. Finally, we can assume the presence of a face when some of its attributes have been recognized.

Deep Learning conquers the world with GPUs

There have been various attempts before the methods have come to an unmatched success. Only after huge datasets and the computational power of GPUs were available, the methods started to shine. After Deep Learning has conquered the task of image classification, it has been successfully applied to object detection by simply dividing the image into smaller parts and employing the classifier at each. The algorithm outputs bounding boxes representing the extent of the object detected and the class it belongs to.

2019-08 AWF KI BBs Industry GER rev3.jpg
2019-08-06 FTS Vis CCI CEFF Iso 01 sq.jpg

A Deep Learning model learns a hierarchy of features

First, it learns to extract edges & simple curves

First, it learns to extract edges & simple curves

Using detected geometries, it can recognize parts of the face

Using detected geometries, it can recognize parts of the face

Finally, the model can deduce a face is present in the image

Finally, the model can deduce a face is present in the image



Autonomous Driving & Automated Guided Vehicles (AGVs)

Object detection serves as a vital component for autonomous vehicles to understand their environment. Autonomously driving cars have to reliably spot every pedestrian to be able to meet global safety standards. Also object detection can serve as the first step to collision avoidance. Finally, navigation systems of autonomous systems can be enhanced by object detection. Unique objects in the environment serve as landmarks helping the system to localize itself.

On-demand pick & delivery with mobile robot systems


Social Robots

Service robots are created to communicate with people and assist them. The precondition for this service is to be able to detect humans. Knowing where the conversation partner is located enables the robot to face the person, making the interaction more human-like. Nuanced detection systems can also recognize the physical state and emotion which subtly reveals the necessary needs to communicate. All in all, object detection makes social robots behave more like humans.

Meet Sanuki in this short episode to get a feel for how we imagine human-robot interaction at home and witness some of our fancy perception tech.



Object detection embodies a vital component in today's AI systems. Often it is the first step towards a sophisticated understanding of the environment. We at GESTALT Robotics employ state-of-the-art techniques to tackle this task. For more specialized use cases, such as Semantic Segmentation and Pose Estimation, we also offer specific approaches. Please visit the corresponding sites for more in-depth information. We are happy to discuss your project with you and the possible impact of object detection in your line of business. Contact us via mail at info@gestalt-robotics.com or give us a call at +49 30 616 515 60.