Computers that can see may sound like science fiction to many. After all, “seeing” does not mean filming with a webcam, but rather understanding visual material. In fact, such technologies have long been in use behind the scenes of many everyday services. Social networks have been recognizing friends and acquaintances in photos for years, and modern smartphones can be unlocked with your face instead of a PIN code. In addition to these small everyday simplifications, the rapidly growing “computer vision” field holds far greater potential for industrial use. The specialized processing of image material promises to facilitate and automate many repetitive processes or to relieve experts and skilled personnel and support them in their decisions.
The foundations for image recognition and computer vision were already laid down in the 1970s. However, it is only in recent years that the field has found increasing application outside research. This article presents five selected and particularly promising use cases from different industries, which are either already in production or promise significant changes in their respective fields in the coming years.
1. Retail: Customer Behavior Tracking
Online stores like Amazon have long been able to take advantage of their digital platform’s analysis capabilities. Customer behavior can be analyzed in detail and the user experience can be optimized. The retail industry is also trying to optimize the experience of its customers and make it ideal. Until now, the tools to automatically capture the interaction of people with displayed items have been missing. Computer vision is now able to close this gap for the retail industry.
In combination with existing security cameras, algorithms can automatically evaluate video material and study customer behavior. For example, the current number of people in the store can be counted at any time, which is a useful application during the COVID-19 pandemic with its restrictions on the maximum number of visitors allowed in stores. But more interesting might be analyses on the individual level, such as the chosen route through the store and individual departments. This allows the design, structure, and placement of products to be optimized, traffic jams in well-visited departments to be avoided, and the customers’ overall user experience to be improved. Revolutionary is the ability to track the attention that individual shelves and products receive from customers. Specialized algorithms can detect the direction of people’s gaze and thus measure how long any given object is viewed by passers-by.
With the help of this technology, retailers now have the opportunity to catch up with online trading and to evaluate customer behavior within their stores in detail. This increases sales, minimizes the time spent in the store and optimizes the distribution of customers within the store.
Figure 1: Customer Behaviour Tracking
(https://www.youtube.com/watch?v=jiaNA1hln5I)
2. Agriculture: Detection of Wheat Rust
Modern technologies enable farmers to cultivate ever-larger fields efficiently. At the same time, this means that these areas must be checked for pests and plant diseases because if overlooked, plant diseases can lead to painful harvest losses and crop failures.
Machine learning provides a remedy because large amounts of data can be generated using drones, satellite images, and remote sensors. Modern technology facilitates the collection of various measured values, parameters, and statistics, which can be monitored automatically. Farmers thus have an around-the-clock overview of soil conditions, irrigation levels, plant health, and local temperatures, despite the extensive planting of larger fields. Machine learning algorithms evaluate this data so that the farmer can use this information to react to potential problem areas at an early stage and distribute available resources efficiently.
Computer vision is of particular interest to agriculture, as the analysis of image material allows plant diseases to be detected at an early stage. Just a few years ago, plant diseases were often only noticed when they were already able to spread. The extensive spread can now be detected and stopped at an early stage using early warning systems based on computer vision. This means that farms lose less crop and save on countermeasures such as pesticides since comparatively smaller areas need to be treated.
Especially the automated detection of wheat rust has received much attention within the computer vision community. Various representatives of this aggressive fungus infest cereals in East Africa, around the Mediterranean Sea, and Central Europe, and they lead to large crop losses of wheat. Since the pest is clearly visible on the stems and leaves of cereals, it can be detected early on by trained image recognition algorithms and prevented from spreading further.
Figure 2: Wheat rust detection with computer vision
(https://www.kdnuggets.com/2020/06/crop-disease-detection-computer-vision.html)
3. Public Health: Image Segmentation of Scans
The potential of computer vision in healthcare is huge, the possible applications countless. Medical diagnostics relies heavily on the study of images, scans, and photographs. The analysis of ultrasound images, MRI, and CT scans are part of the standard repertoire of modern medicine, and computer vision technologies promise not only to simplify this process but also to prevent false diagnoses and reduce treatment costs. Computer vision is not intended to replace medical professionals but to facilitate their work and support them in making decisions. Image segmentation helps in diagnostics by identifying relevant areas on 2D or 3D scans and colorizing them to facilitate the study of black and white images.
Figure 3: Image Segmentation of Lung CT-Scans
(https://syncedreview.com/2020/03/18/ai-ct-scan-analysis-for-covid-19-detection-and-patient-monitoring/)
The latest use case for this technology is the COVID-19 pandemic. Image segmentation can help physicians and scientists identify COVID-19 and analyze and quantify the infection and course of the disease. The trained image recognition algorithm identifies suspicious areas on CT scans of the lungs. It determines their size and volume so that the disease of affected patients can be clearly tracked.
The benefits of monitoring a new disease are immense. Computer vision not only makes it easier for physicians to diagnose the condition and monitor it during therapy, but the technology also generates valuable data for studying the disease and its course. Researchers also benefit from the collected data and the generated images, allowing more time to be spent on experiments and tests than data collection.
4. Automotive Industry: Object Recognition and Classification in Traffic
Self-driving cars definitely belong to the use cases in artificial intelligence, which have received the most media attention in recent years. This can probably be explained more by the idea of autonomous driving being more futuristic than by the actual consequences of the technology. Several machine learning problems are packed into it, but computer vision is an important core element in their solution. For example, the algorithm (the so-called “agent”) by which the car is controlled must be aware of the car’s environment at all times. The agent needs to know how the road goes, where other vehicles are in the vicinity, the distance to potential obstacles and objects, and how fast these objects are moving on the road to adapt to the changing environment continually. For this purpose, autonomous vehicles are equipped with extensive cameras that film their surroundings over a wide area. The resulting footage is then monitored in real-time by an image recognition algorithm. Similar to Customer Behavior Tracking, this requires that the algorithm can search for and classify relevant objects not only in static images but in a constant flow of images
Figure 5: Object recognition and classification in road traffic
(https://miro.medium.com/max/1000/1*Ivhk4q4u8gCvsX7sFy3FsQ.png)
This technology already exists and is also used industrially. The problem in road traffic stems from its complexity, volatility, and the difficulty of training an algorithm so that even possible failure of the agent in complex exceptional situations can be excluded. This exposes Computer vision’s Achilles’ heel: the need for large amounts of training data, the generation of which is associated with high costs in road traffic.
5. Fitness: Human Pose Estimation
The fitness industry has been in the process of digital transformation for years. New training programs and trends are presented to an audience of millions via YouTube, training progress is tracked and evaluated using apps, and virtual training and home workouts have enjoyed massive popularity since the start of the corona crisis at the latest. Particularly in weight training, fitness trainers are vital for support in studios because of the high risk of injury – until now. While it is already common practice today to check one’s own posture and position during training via video, computer vision makes it possible to evaluate and assess video material in this field more accurately than the human eye.
A technology is used that is similar to the Attention Tracking already introduced in the retail industry. Human Pose Estimation enables an algorithm to recognize and estimate the posture and pose of people on video. For this purpose, the position of the joints and their position in relation to each other is determined. Since the algorithm has learned what the ideal and safe execution of a fitness exercise should look like, deviations from it can be detected and highlighted automatically. Implemented in a smartphone app, this can be done in real-time and with an immediate warning signal and warn in time of dangerous errors instead of analyzing movements only afterward. This promises to significantly reduce the risk of injury during strength training, make training without a fitness trainer safer, and reduce the cost of safe strength training.
Human Pose Estimation is a further step towards digital fitness training. Smartphones are already well established in fitness training, and apps that make training safer are likely to be well received by the broad user base.
Conclusion
Computer vision is a versatile and promising field of machine learning that has the potential to solve a wide range of problems in a variety of industries and sectors. The processing of image and video material in real-time enables solving problems far more complex than with conventional data formats, which brings the state of machine learning “intelligent” systems ever closer. Today, standard interfaces to computer vision are becoming increasingly common – a trend that only seems to accelerate in the coming years.
The examples presented here are only the tip of the iceberg. There are significant efforts in each of the mentioned industries to make existing processes more efficient with computer vision technology. Currently, there are many efforts to lift computer vision into the third dimension and to process 3D models instead of photos and scans. The demand for industrial image processing in 3D is growing, both in surveying, medicine and robotics. The processing of 3D image material will still receive attention in the coming years because many problems can only be solved efficiently in 3D.