Should we let AI drive?
The fascinating performance of AI in various fields of application is undeniable. One of the first truly inspiring applications of AI could be assisted and automated driving. Based on the development of the field in the last few years, one tends to think that the way forward is to copy how humans see the world, drive, and think. While the temptation to copy nature is indeed huge, one must bear in mind that the very purpose of automation is to replace the human and not replicate it. As a matter of fact, driving capabilities were not part of the “reward functions” of human evolution; humans are not optimized for driving. Rather, it is the other way around, driving rules have been loosely tailored to average human capabilities. As a result, it is obvious that, with a healthy balance of human-like, learning/data-based AI solutions, and other technologies one can create a solution which is safer than human drivers. In the following sections, the issues of using AI in automated driving solutions are examined. First, the possibilities of using AI for perception and classification tasks are detailed. This is followed by an explanation of the limitations and promising areas of development for the use of AI in decision making and motion planning. Finally, aspects of how simulation can be used efficiently for the development of AI-based automated driving systems are detailed. The issues of AI in perception The use of AI in perception tasks has proven to be extremely powerful and been found to be much more robust and scalable than the so-called computer vision algorithms used previously. However, the use of AI technologies raises many questions and concerns, especially in safety-critical applications. What are these? CV versus AI In case of a conventional CV algorithm, one can formulate a statement similar to the following: “If the input image/object possesses ‘such and such’ property, then my algorithm will detect it.” Such statements cannot (yet) be made in case of AI-based recognition solutions. Many think that this is the problem. However, the real problem is a different one entirely. The real problem is that one cannot guarantee that the input space will indeed have their required properties. For example, one cannot prescribe what kind of clothes a pedestrian should wear, or that a traffic sign does not always look the same: it can be dirty, aged, modified, bent, rainy. If an object looses its required properties even classical algorithms are unable to deliver any detection or classification guarantees. Therefore, the real problem is, that the appearance of the world cannot be formalized. What follows is that the only way to generically capture the richness and diversity of the world is through data-based techniques. Consequently, AI is not an option but a must in perception. (Naturally, special use-case exceptions might apply.) So, the right question is not which one to use: CV or AI; but rather how can the right set of data to train my AI be collected? Collecting the right amount and diversity of data is unavoidable but not the single element of the process. The data collection problem Depending on the definition of ‘optimal performance’, there are different answers and approaches to addressing this problem. Unfortunately, in-depth insight into the internal representations of a neural network is unavailable, as is an exact measure of its so-called parametric instability. In other words, what image “similarity” means from a NN’s point of view cannot be defined in advance. Thus, one is currently unable to predict whether a network will work on a specific image without actually executing it on the image. In the absence of an exact understanding of the above mentioned two properties, the generalization capabilities of a network cannot be defined formally, but only through extensive statistical testing. Therefore, to ensure correct training data composition, it is currently necessary (but not necessarily sufficient) to collect data from all geographical locations and of all operation conditions (day/night/rain/fog/etc.,) and limit the validity of the network inference to these use cases. Only in this case can one transform the confidence levels output by the NN into a meaningful probability space correctly, which is essential for reliable sensor fusion. Hungary, Japan and the US are the locations AImotive is currently testing its technologies in. Diverse data is needed to train and validate AI-based solutions. (Images: Unsplash) Data collection of this type can only be performed for static, “permanent” obstacles, buildings, road markings, signs, etc. Completing similar levels of data collection for all possible objects that could move through a specific operation domain is another question entirely. This is exactly why one must clearly differentiate between existence hypothesis and classification performance in the perception pipeline. Existence hypotheses should come with orders of magnitude higher confidence than classification and should not be influenced by operation conditions. Until a precise understanding of AI and high enough confidence in AI-based image processing for generic object recognition elude engineers, active sensors and the laws of physics are the only way to achieve the confidence level required for self-driving. Determinism and parametric instabilities. It is often claimed that neural networks are not deterministic. This is not true in the pure mathematical sense of determinism. A neural network is deterministic, i.e. for the same input it gives the same output. Phrased correctly, a large percentage of neural networks used have extremely high and unknown parametric instability. This means that for a small perturbation of the input — or that of the weights — the network could provide very different output. Why is this a problem? Because a street never looks exactly the same. Furthermore, there is no current measure of how similar a street should be on two consecutive days in order for a perception network to give reliable, constant and correct results. Motion planning, decision making While there are clear gains to be achieved from using AI in perception and classification, there is another use case. Utilizing AI in motion planning and decision making. The traffic rules that define the behavior of vehicles, priorities, etc. are generally clear. However, many elements of the current traffic system aren’t based on formalized prescription. For this reason, it is tempting to try a different kind of AI, so-called reinforcement learning (RL) techniques in motion planning and decision making. Understanding system limitations While the benefits of applying AI to these tasks to a certain extent is unquestionable, the limits of utilizing these techniques can be easily determined. Namely, the requirement, that: we must understand system limitations. What does this mean? That whenever a scenario fails or disengagement happens, which is probably the result of a wrong decision — let us assume perfect perception for the sake of argument — development teams must be able to understand the underlying causes to be able to correct them. Current insight into RL agents in three aspects limits the understanding of why a decision has been made. These are the following: the precise internal operation of an RL agent; the input phase space topology; the input phase space coverage/exploration success of the AI (RL) agents in the offline training process. Therefore, an AI agent should not be allowed to directly control a vehicle. The supervision of on-the-fly computable algorithms that consider at least the obvious kinematical, dynamical, traffic rule-based (or other known) constraints of the situation is required. Nevertheless, AI can support motion planning and decision making in some cases, such as: predicting possible moves of other participants in traffic; providing an initial guess of the ego vehicle’s trajectory; on-the-fly tuning of corresponding parameters of the “classical” trajectory planning, decision-making algorithm. Offline training versus on-the-fly computing There are mathematical frameworks that — with good approximations — are formally equivalent to offline reinforcement learning but can be computed on-the-fly, using cost functions relevant to the situation instead of an offline collected ensemble of situational data. The usage of these algorithms is currently prohibited by the required amount of computing power, and by the absence of efficient, online, phase space exploration strategies. However, the aggressive increase of computing power available for AD and ADAS will enable the execution of more and more powerful on-the-fly calculations In the future. Despite their disadvantageous power consumption (which will not really be an issue in a time scale of a few years), these approaches have many advantages over offline trained algorithms: given their input space, the algorithms are explainable; method equivalent to traditional “offline trained” reinforcement learning; can easily be integrated with classical RL or other AI-based methods; optimal for the exact situation instead of being optimal for an offline ensemble of situations; can easily handle a variable number of input arguments; can handle varying prediction time horizon; can adjust compute resources used depending on the situations In summary, it can be said that intelligence is not math, intelligence is computing. And this is even more valid when it comes to trajectory planning and decision making. Of course, the algorithms described above also have disadvantages. As a result, final solutions will most probably rely on a hybrid solution of AI-based initial conditions determination and some amount of online computing. Simulation and artificial intelligence Alongside artificial intelligence, one of the most touted technologies connected to self-driving is simulation. However, due to the parametric instability of neural networks, there are several challenges to efficiently deploying simulation for testing automated driving solutions. Even the most realistic simulation is a simulation and not the real world. The visual fidelity of simulated sensory data As of today, no widely accepted metrics which describe the fidelity of simulated sensor (camera) images compared to real-world ones exist. As such, one cannot strictly determine how relevant simulation-based perception testing is to the real-world use case. Even minor differences in input can cause major changes in the output of the NN because no one really knows which features, representations are important from the NN point of view. Currently, the most efficient way to gain insight into this question is a continuous, iterative comparison of the output of a perception network on real-world footage and on its simulated counterpart. If the distribution of AI output is similar in both sequences, then a virtual scene can be considered a statistically representative simulation of the real world. This comparison must be made for each neural network used in the software and only when all of them match can one say that simulation is realistic enough from the point of view of testing AI. For the reasons explained above, there is no measure to define how similar two simulated locations are to each other or the real world. Currently, the safest method is to simulate most of the designated operation domains in the sense of both geographical location and driving conditions (time of day, weather, etc.). Testing on traffic scenarios Simulation is increasingly becoming one of the most vital methods of validating and verifying automated driving systems. However, as these systems contain a wide range of neural networks the question of how AI should be tested in simulators should be raised. When testing the behavior of the system the question is what scenarios to execute? Simulated traffic scenarios are a fundamental training methodology, especially in simulation. They provide a formulated repeatable test to measure the performance of the system. Scenarios can be collected or defined by various sources. They can be handcrafted based on functional safety requirements, these are operational domain-specific. They can also be taken from regulatory requirements such as those set out by EuroNCAP. Historical accident statistics and the underlying causes of road accidents can also be used to define new scenarios, as can dangerous situations collected during real-world road testing. Finally, randomized testing with parametric Monte Carlo simulation is also possible. All these methods are extremely helpful in collecting interesting corner cases, but there are three issues: The sampling problem — The distribution of interesting corner case scenarios is a so-called long tail one and many-dimensional. Sampling a statistically significant amount of these is a non-trivial challenge. The false alarm problem — Executing interesting corner cases does not support the estimation or measuring of the false alarm rate in simulation. For false alarm estimation and suppression, a combination of boring miles and interesting corner cases is needed. The interaction problem — Real-world scenarios include several actors, reacting to each other decisions and communicating in ways not understood by automated driving systems (eye contact, facial expressions, waving, flashing lights, etc.). The possibility of simulating these interaction is limited. The most viable solution is to consider all the problems in parallel and utilize all possibilities of scenario creation. However, the only way to collect enough data with the right sampling methodology in an economically feasible way is to release AD features as ADAS offerings and include a benchmarking system — shadow-mode data collection — interactive mode data collection — upgrade loop to improve its quality. Conclusions As seen above, artificial intelligence is the only viable method to solve the automated driving challenge. However, the general understanding of artificial neural networks remains limited, as a result, systems utilizing the technology must undergo robust testing procedures. Developing safety standards, such as SOTIF, are increasingly taking AI into consideration. Vitally, no AI-based solution should be allowed to directly control a vehicle until the operation of neural networks is properly understood and well-documented. Developing solutions are mitigating these risks but are currently limited by the processing power of available hardware platforms. Simulation remains the most efficient tool to develop automated driving solutions and gain a better understanding of how AI-based systems work. However, simulators are fundamentally different environments to the real-world and this must always be taken into consideration. This challenge is augmented by the difficulty of ensuring proper diversity in testing on a statistically relevant number of kilometers. Finally, the inability of simulators to recreate the behaviour of human actors in scenarios should always be considered. As a result, the most viable approach to creating robust and safe automated driving systems lies in deploying solutions as ADAS functionalities. Benchmarking deployed systems and utilizing the data they collect is the most efficient way of advancing our understanding of AI-based self-driving systems.
Posted by Gergely Debreczeni