Self-Driving Data Deluge

Reading time
1 min read
Detection of pedestrians and a person holding a "stop" sign

Teaching a neural network to drive requires immense quantities of real-world sensor data. Now developers have a mother lode to mine.

What’s new: Two autonomous vehicle companies are unleashing a flood of sensor data:

  • Waymo’s Open Dataset contains output from vehicles equipped with five lidars, five cameras, and a number of radars. Waymo’s data set (available starting in July) includes roughly 600,000 frames annotated with 25 million 3D bounding boxes and 22 million 2D bounding boxes.
  • ArgoAI’s Argoverse includes 3D tracking annotations for 113 scenes plus nearly 200 miles of mapped lanes labeled with traffic signals and connecting routes.

Rising tide: Waymo and AlgoAI aren't the only companies filling the public pool. In March, Aptiv released nuScenes, which includes lidar, radar, accelerometer, and GPS data for 1,000 annotated urban driving scenes. Last year, Chinese tech giant Baidu released ApolloScape including 3D point clouds for over 20 driving sites and 100 hours of stereoscopic video. Prior to that, the go-to data sets were CityScapes and Kitti, which are tiny by comparison.

Why it matters: Autonomous driving is proving harder than many technologists expected. Many companies (including Waymo) have eased up on their earlier optimism as they've come to appreciate what it will take to train autonomous vehicles to steer — and brake! — through all possible road conditions.

Our take: Companies donating their data sets to the public sphere seem to be betting that any resulting breakthroughs will benefit them, rather than giving rivals a game-changing advantage. A wider road speeds all drivers, so to speak. Now the lanes are opening to researchers or companies that otherwise couldn’t afford to gather sufficient data — potential partners for Aptiv, ArgoAI, Baidu, and Waymo.


Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox