Autonomous Vehicles: Stanford’s Michael Montemerlo Reflects on the DARPA Urban Challenge

Last November in an abandoned Southern California Air Force base, 11 extraordinary cars did some rather mundane things. They changed lanes, merged into traffic, stopped for stop signs and clogged traffic, passed other cars and negotiated for the right-of-way through intersections. Occasionally, the negotiations failed, driving habits became unpredictable, and cars collided. But no driver was to blame, because there was none: no person behind steering wheel; no remote pilot at a terminal.

These autonomous vehicles were competing in the third installment of the DARPA Grand Challenge, which has become a showcase for the marriage of artificial intelligence and robotics. As you might suspect from a race sponsored by the U.S.’s Defense Advanced Research Projects Agency, the research goals are ostensibly military. But the payoff for commercial vehicles is potentially huge.

Last year’s race was a big leap from the previous two DARPA-sponsored events. No entry even completed the first race, held on a 142 mile desert course between Barstow, California and Primm, Nevada. Stanford University’s Stanley won the second event and the $2 million prize, competing on a 132-mile course in southern Nevada. But as its name suggested, the DARPA Urban Challenge was a race of a higher magnitude of difficulty, designed to simulate city driving. Stanford’s Junior, a modified Volkswagen Passat, finished 19 minutes (calculated overnight from a complex formula involving timings and penalties) behind first place Tartan Racing, an alliance headed by Carnegie Mellon University’s Red Whittaker. Junior was in some ways the Silicon Valley’s unofficial entry; its local sponsors included Google, Intel, and even a venture capital firm: Mohr Davidow Ventures.

A couple of months later, I sat down with Michael Montemerlo, co-team leader of the 40-person Stanford Racing Team, at his office at the Stanford Artificial Intelligence Laboratory to talk about the art of programming autonomous cars. Junior was housed in a garage some miles away, but the robot’s view of the race could be seen on a monitor, which showed a continuously changing monochrome 3D model composed of laser scans.

This is a small research community. Montemerlo earned his PhD at Carnegie Mellon under Whittaker and Sebastian Thrun, and then followed Thrun to Stanford to work on a post-doctorate. A week after we spoke, the participants would converge on Whistler, British Columbia to compare notes.

No robotic vehicle completed the first race, while four vehicles completed the second. What made the difference?: One of the big lessons people learned was that driving is fundamentally a software problem. DARPA said from the beginning that a four-wheel drive pickup truck was perfectly capable of completing the entire course. So there’s no reason to build a crazy, custom robot because none of us are going do it better than Volkswagen or GM. We now start with a rugged, reliable car and then spend as much time as possible on the software to make it drive safely and reliably.; For the third race, DARPA stipulated entries had to be normal cars with established safety records. Tartan used a Chevy Tahoe. Terramax used a giant military truck from the Oshkosh Truck Corporation. North Carolina State University used a Lotus Elise. So while there were a wide variety of commercial vehicles, everybody viewed this as a difficult software problem. I don’t want to discount all the work the teams did on the hardware-if you get a flat tire in the desert, you are out of the race. But software has become the determining factor.
How difficult a robotics/AI problem is it?: In the desert, the problem was pretty straightforward: stay on the road and drive fast. DARPA provided a GPS track-a set of electronic “⁠bread crumbs⁠”⁠ that you can follow. You buy a fancy GPS, but it’s still a noisy sensor; in some places a GPS works well, in other places it doesn’t. And the track may not be perfectly accurate, meaning that you should use GPS as a sensor, but not follow it blindly. You need other sensors-a camera, perhaps, or, in our case, laser range finders. As it turned out, just being able to center your vehicle gives you a huge advantage because usually the rocks and boulders are on the sides of the road, not sitting in the middle. So if you center yourself in the road, you end up avoiding 90 percent of the obstacles.
But that all changed with the Urban Challenge.: Urban driving is a super-hard problem and much less explored, which requires a fundamentally deeper understanding of the world. In the desert, Stanly understood the world as one of three things: there’s stuff I can drive on, stuff I can’t drive on, and stuff I don’t know if I can drive on. That’s all we thought of -- with pixels that went from red to white to gray. But if you are driving in an urban setting and you see some blob of obstacles in front of you, it’s not that simple. If that blob is a stop sign, I can’t just drive around it and pass it-I have to obey the law by queuing up and waiting my turn. But if that blob is a curb or a telephone pole, I can’t sit there and wait for the curb to drive away. If the blob is a car, I expect it to follow the rules of the road, approximately. If it’s a pedestrian or a bicyclist, it will be much more unpredictable.
What did DARPA tell contestants about the race course ahead of time?: Twenty-four hours in advance, we were all given an RNDF, a route network definition file, which defines a network of roads and the lane markings that determine where you can and cannot pass. In some places, the roadmap was detailed; in other places it was very sparse. Then, five minutes ahead of the race, they give you a USB memory key containing a mission description file, which contains two pieces of information: a sequence of checkpoints you have to visit and speed limits for all of the roads.
And you had to adhere to the California Motor Vehicle Code.: The rules of the race were like the handout you get at the Department of Motor Vehicles. Most of these rules are not things the robot would care about-how fast you can drive in a parade, for example. But the basic rules governed behavior at stop signs, intersection precedence, merging into moving traffic, traffic circles, and right of ways. Vehicles that broke the rules were penalized in time.
Who built Junior?: We’re based out of Stanford. Our sponsors include Volkswagen of America’s Electronics Research Laboratory, which is about four miles down the street and works on advanced electronic technologies. They gave us the car and built the “⁠drive-by-wire⁠”⁠ (Dbw) system. That means there is no mechanical connections between the steering wheel, accelerator pedal, gear shift, turn signals and the components they control. In practice, this means we can electronically control the car’s actuators-gas, brake, steering, gear shifter and turn signals. Those are the ones that we care about.; We put sensors on the roof and all around the car. We put computers in the back, and then we write software that tries to understand the sensor data, makes a plan of what to do, and then executes that plan using the car. With five minutes to go, we insert the memory stick with the route file and then get out of the car. There’s no further communications. DARPA does follow each car with a chase vehicle containing two override buttons: pause, and kill. Pause is like a yellow flag in racing; it temporarily stops the car so they can sort out things. Kill is the end of the road. Both of them cause the car to come to a prompt stop. Kill stops the car and actually kills the engine.
Where were you during the race?: The robots each go on three missions. So we waited at the pit area and had no idea what Junior was doing until it returned. The webcast provided a much better view. The course is still much too dangerous for pedestrians, who are not only more vulnerable than cars, but less predictable. That’s a challenge for a future race.
Talk a bit about the software and hardware.: We do everything in C and C++. We used two Intel quad-core machines, but only five of the eight available cores, so we certainly weren’t limited by CPU processing. We were also fine on memory-we just used two gigabytes apiece. We recorded log data from the sensors onto hard drives for later playback, but the whole race represented only about 75GB of data.
So the only fancy part of this setup was the processors?: Other teams had other approaches. MIT, for example, had about 40 cores. One big difference from last year was our decision to substitute lasers for a camera. Computer vision is a very computationally intensive task with a huge amount of data that can be very hard to reliably interpret. By contrast, laser data is much less ambiguous; Lasers usually only generate range data, that is, they measure the time it takes for the laser light to hit the object and come back to the sensor. But this only results in a 3D data structure, not an image, per se. We take advantage of the fact that some lasers can also measure the intensity of the light returned. This intensity data looks kind of like an image, except that it is in the infrared spectrum-for our lasers, at least. As it turns out, objects that reflect well in the visible spectrum also happen to reflect well in infrared: for example, road lines show up very well.; So by applying computer vision algorithms to the laser intensity data, we can still do computer vision-like things. We texture the intensity data onto the 3D model, which is kind of like stretching a piece of fabric over a mold. Then we then analyze the texture of the flat parts of the model--the road part--and find the lane lines. The cool thing about this approach is that it works at night as well as it does during the day.
How much simulation do you do ahead of the race?: Simulations are important because there are some things that are too complicated or too dangerous to test in real life. Simulation also lets us try something out as a first step. If a new algorithm doesn’t work in simulation, it’s certainly not going to work in real life, though the converse is not necessarily true. We try to use actual, rather than simulated, sensor data because it introduces more complexity. But we simulate the higher level perceptions-a blob here, a car moving there. We can’t simulate the exact trajectory the robot would take, but we can simulate high level behaviors, like the car trying to pass another car, or robots arriving at the intersection at the same time. So as long as we treat the output of the simulation appropriately, it can be a very useful first step.
What about real-world testing? Where you go to run an experimental autonomous car?: We’ve used a variety of places. We did a lot of it in an empty gravel parking lot over by Google in the Shoreline Amphitheater [a performing venue in the Silicon Valley]. They didn’t have any lines on the pavement so we “⁠imagined them⁠”⁠ them-the robot knew where the roads were even if the people didn’t. We also set up cones and tried out all kinds of crazy intersections and courses. But we always assumed the race would be held at an abandoned military base because you can’t do this in a real city and you can’t build an urban course just for the race. So it has to be some place that already exists, that nobody uses, and that the government can use without a problem.
Which turned out to be Victorville.: Right, an abandoned military base. So we went to the Alameda Naval Air Station, which is closed. Parts of it are now residential and commercial, but there were still a few places that were safe for us to use. We also went to Ford Ord [an abandoned army base near Monterey] for a couple of days. They have a residential section that is completely closed, which we cleaned up and used for a while.
Are the applications of the DARPA challenge strictly military?: There are also commercial possibilities, and I’m sure that DARPA is interested in those, as well. Every year, cars come out with more sensors-sonar, radar, cameras-and they have more built-in computing power. And despite the fact that humans are still behind the wheel, they are also increasingly drive-by-wire: more electronic control through actuators. So you have sensors, computers, and actuators: to me, that’s not a car, that’s a robot. What you have is a fantastic platform that millions of people have parked in their garage. There are other robots I could imagine, cool ones that people might like to own and use, but I’d have to convince people to buy them. But with cars, if we can improve the driving process, we immediately impact millions of people. That’s a big deal for robotics.
AI researchers now realize that people won’t put up with, say, an artificial doctor, but will accept an AI version of a doctor’s assistant-who helps the doctor. Is there an analogy here?: Yes. The car companies aren’t thinking in terms of autonomous driving, at least for now. They are thinking in terms of driver assistance systems. An example already out on high-end cars is adaptive cruise control, which uses radar to determine the distance to a car in front and then, if necessary, slows down. A fully autonomous car would be the ultimate driver assistance system, and that’s not going to happen overnight. Rather, your car will become more autonomous over time, especially when it comes to avoiding accidents.
Stanford researcher Gary Bradski thought he could see different personalities at the DARPA race. He thought Boss was more aggressive, Junior more Zen-like.: We can attribute too much personality to these robots. They do have quirks to their driving style, which seem like personality, though I think that’s giving them too much credit. But it is true that if you let me write the driving planner, the robot will drive like I do-more conservatively. And if [co-leader] Sebastian [Thrun] had written the planner, it perhaps would be a more aggressive driver: a German driver.

Sidebar: Inside Junior’s “Brain”-how Stanford’s Autonomous Car Makes Decisions

At the core of Junior’s “⁠brain⁠”⁠ is a piece of Stanford-designed software called the Planner, which factors in the street layout, rules of the road, and the race’s goal, then decides what course of action to take. This decision-making process takes place at three levels. The top two, global and tactical, are combined to create a continuous trajectory-the actual route Junior intends to follow.

The global level is familiar to anyone with a GPS or Google Maps. It uses a routing algorithm to determine how Junior is to go from point A to point B. Go to the next intersection, turn right, and then make an immediate left. The tactile level determines what lane changes and other driving tactics should be employed to follow this route, factoring in speed and safety.

The usual method for this sort of calculation uses the A* (pronounced “⁠A-star⁠”) tree search algorithm. But the software behind Junior uses a more computationally-intensive method known as dynamic programming, in which every path to the goal is considered, including both turns and lane changes. The advantage of dynamic programming is its flexibility to determine the probability that a desired action is worth the “⁠cost⁠”⁠ in terms of time saved-even before the sensor data informs that decision.

For example, consider a situation in which Junior is preparing for a right-hand turn, but is impeded by a slower car immediately in front. Should it try to pass? The reasoning might go something like this:: “The car in front of me is going five miles an hour slower than I want to go, so changing lanes could be advantageous. But I need to turn right at the upcoming intersection and the traffic is quite dense, so that at any given attempt, I have (according to my calculations) only a 10 percent chance of safely getting back to the right lane. And because my upcoming turn is just up the block, I have fewer opportunities to safely return to my lane than if the intersection were kilometer away. And if I don’t make the turn, I must travel an extra 500 meters to get back on course. Adding all these factors together-the chances of a safe lane change, the distance to the intersection, the cost of missing my turn-I think it is better in this situation not to pass.”

Montemerlo says that dynamic programming is well suited to this kind of on-the-fly calculation, whereas A* would require a more rigid, heuristic rule that might not account for every possibility. “⁠The beauty of dynamic programming is that it combines the global and tactical planning to come up with a much better situational comparison.”

The decision becomes a “⁠macro plan⁠”-a rough cut of what Junior intends to do-which in turn becomes the basis for the continuous trajectory: the precise route Junior actually tries to follow. “⁠Imagine that the macro plan says that over the next 100 meters, I need to change lanes and drive through the intersection. To determine the trajectory, Junior generates a series of what we call nudges and swerves.⁠”⁠ A nudge is a small change, such as the centering of the car in the lane. A swerve is a more dramatic change, such as to avoid a collision. Each nudge and swerve results in a track that is offset somewhat from the original path. The candidate track that best fulfills the macro plan becomes the continuous trajectory.

Complicating these calculations is the fact that driving involves different states: the decisions are different when approaching a stop sign than when moving through the intersection. Junior considers about a dozen such states-including normal forward driving, reverse, stopped at stop sign, and U-turns. The states are mapped in a finite state machine schema. For example, when stopped at a four-way intersection, the next state may be to proceed, or it may be to wait make certain that other cars don’t jump out ahead (a state the Stanford team calls “⁠stop for cheaters⁠”).

The Planner works the same whether the run is actual or simulated. “⁠The software is completely ambivalent as to where the data is coming from,⁠”⁠ Montemerlo said. “⁠We do this through a modular design. There are modules that communicate with the sensors using the hardware’s own protocol, and then publish that data for the rest of the software to see. Other modules interpret the sensor data they subscribe to-but publishers don’t care who is listening and the subscribers don’t care who is publishing.⁠”⁠ Modularity also frees the software from requiring a specific hardware architecture. Programs can run on a single computer or be spread across the network. New sensors can replace older ones, then simulate with a minimum of reprogramming.