Like discussed in a previous post we have given the control of a physical world machine to a computer. Now we’re about to give it Intelligence. What if the Artificial Intelligence won’t stop at learning to drive but goes on and learns to solder and order parts online and starts duplicating itself and.. well yeah about that.
Despite how awesome that would be, we have no such Intelligence.
Default Donkeycar AI is a convolutional neural network taught with supervised learning. Supervised learning is probably the most common type of deep learning currently used in practical applications like image recognition.
It means the neural network is trained using training data comprising of inputs and corresponding correct outputs. In this context, it means we train a software to predict two outputs for a single input. Throttle and angle for the given image.
Deep down it is built with a high level deep learning library Keras, which in turn uses Tensorflow inside.
It can be highly beneficial if you’re having a wild west party while recording the training data
The first step in teaching the AI is to gather the data. Donkeycar software has a built-in feature to record both the camera image and controller data. We have configured everything in the Bluetooth controller, so training is behind a single button. The enabled recording mode is manifested by Knight Rider theme playing from the car speakers. That way the driver can be sure the car is ready and recording.
When recording is on, a human driver drives the way we want the car to drive. Usually, it means quick and clean laps, but we have also experimented with other styles. We are usually using a throttle limit in the car configuration to limit the maximum throttle to something like 50%. This is not only because we want to prevent rapid unplanned disassembly, but because the way AI drives is not exactly the same human does. Low but steady throttle in training data leads to AI driving faster than if it was trained with harder acceleration and coasting towards corners.
As a result of the training, we get around 1200 records per minute we’ve driven. A single record means one frame from the vehicle camera and one JSON file containing the saved values from the exact moment. By default, those values are steering angle and throttle, but that set can be extended to contain also other data like acceleration measurements from IMU.
Recorded data contains the video frames and corresponding measurement data
After enough training data is gathered, it’s time to move aside from the physical world and do some computing. In a default Donkeycar environment, the data is used directly to train the neural network. Data is given to Keras for training and validating the model. After it starts we just wait and watch.
This training step is quite magical as it is not something you have to really understand, but still, it can generate you a good model. It doesn’t even take long to train a model with a decent amount of training data as long as you have some recent GPU in use. And even using just CPU you can get your models trained in a few hours.
Just watching the model train can be entertaining, but you can also monitor the outputs Keras gives to figure out what’s happening. You can watch the output from the command line, but there is also a graphical tool called Tensorboard shipped with Tensorflow. It starts a web server so you can open the UI in your browser for some nicer view on what’s happening.
The main output to look for is the loss value. It tells you how accurate predictions the model makes. In theory, if it goes to zero, your model is perfect. In practice, with too small loss value, your model has overfitted. Overfitting in this context can simply mean something like “use throttle 0.5 and angle 0.8 when that shoe is in the upper corner of an image”. So if you move the shoe or if something casts a shadow on the track or some other really minor detail changes, the model gets confused. So instead of zero loss, we want to go for generalization of the network.
For evaluating this generalization, there is another loss value: val_loss. It’s short for validation loss. Validation data is a part of your dataset that is automatically taken out of your whole dataset before the training starts. It is meant for validating the model with a dataset that the model has not seen before. It is also never used for training the model so it can be thought as the “test track” Keras takes the model to for testing if it actually knows how to drive and is not just memorizing the irrelevant details. As a bad analogy, the validation set could be a different track where you go testing if your car knows how to drive. So as that val_loss value goes down, your model really gets better to cope with the real world.
Training can be stopped by hand or by using an automatic stopping algorithm which detects when the accuracy is not improving enough over time. Now you have an artificial intelligence packaged into a single file.
Tensorboard makes it easier to figure out what’s happening inside Tensorflow
Now that school’s out for summer, it’s time to move the model file to the cars computer and start the driving software. That’s pretty much all that’s required for this step. Time to let your AI fasten its miniature seat belt and step out of the way.
What happens inside the driving software was explained in the previous post, but in short: there is a fast loop that takes a frame from the camera, drops it into the AI and takes the steering angle and throttle values out of the other end. Those values are then used to control the steering servo and motor.
At this point, it is usually a surprise how well the trained AI model works. Or doesn’t. As this whole project is about giving control to a black box neural network, you usually can only guess why the AI works as it does. Of course, with time you learn to reason why something might happen but it’s a rare occasion if you can really be sure.
One nice point in this simple architecture is that it doesn’t tie you into the RC car. Early into the project we made some utilities to record controller inputs and screen images and trained an AI to play a driving game on PC with an emulated controller. And it worked way better than supposed though I wouldn’t get to passenger seat just yet.
This simplest form of AI mimicking human may work pretty nicely but still, it’s just a start. We have a long enough list of improvement ideas and will get back to those later. Some of those we have even tried already.
Data augmentation means you can multiply the amount of data by copying it with little variations. It reduces overfitting and thus makes the AI work in a more dynamic environment. We have tried that and will share both the results and implementation later. It is something that really makes a difference.
We have also built a test bench for replaying the recorded data to analyze it by hand. It can be used both for spotting problems in the recordings or looking at the AI predictions before moving the model into the car.
And finally, the current neural network architecture is quite limited considering it uses only one image and that it tries to optimize the exact same driving style as the human behind the training data. Our team with limited experience on driving RC-cars has been a bit surprised on how hard it actually is to drive fast and clean laps. On the other hand, we have been quite amazed at how well the trained AI works. At the same time, it looks like everyone is talking about Reinforcement Learning, which means the AI gets to train itself and ultimately beats humans in skill.