Business_news Waymo is crowdsourcing AV research by publishing a dataset (GOOGL)


  • This is an excerpt from a story delivered exclusively to business Insider Intelligence Transportation & Logistics Briefing subscribers.
  • To receive the full story plus other insights each morning, click here.

Waymo, Alphabet’s autonomous vehicle (AV) subsidiary, announced on Wednesday that it would release an AV dataset to the public.

A Jaguar I-PACE self-driving car is pictured during its unveiling by Waymo in the Manhattan borough of New York

The data comes from sensors on Waymo’s self-driving vehicles, collected in a variety of driving conditions across California, Washington, and Arizona. By putting data in the hands of researchers, Waymo hopes to facilitate advances in “2D and 3D perception, and progress on areas such as domain adaptation, scene understanding, and behavior prediction.”

A Waymo principal scientist told Axios that “the more smart brains you can get on the problem, whether inside or outside the company, the better.” The dataset will come prelabeled with 2 million 3D labels and 1.2 million 2D labels — this should be particularly appealing to researchers, as the labeling process can otherwise be time- and labor-intensive.

AV companies have turned to outsourcing research to expedite the development of their technology, which relies on AI. By releasing a dataset, Waymo is hoping to glean insights from sources beyond its internal research pool. Other companies with AV investments — including Argo and Lyft — have already released datasets to the public.

The datasets only represent a fraction of the data collected by the companies’ respective AV fleets, but they should give researchers enough data to develop insights for publication in academic journals, thus helping to advance AV technology across the board.

Releasing datasets is just one method of expediting AV research. Here are two other examples:

  • Crowdsourced labeling. AV technology relies on AI to interpret surroundings and make maneuvering decisions. To train AI models, humans often manually label datasets. For instance, for an AV to recognize a traffic light, it must be able to reference a backlog of thousands of human-labeled traffic lights from previous encounters. Manual labeling is an extremely labor-intensive process — it accounts for an estimated 80% of the labor time associated with developing AI technology. AV companies have turned to crowdsourced labor to reduce labeling costs. Amazon Mechanical Turk, the most prominent of such services, pays people to label data, often pennies per task. In 2018, The Atlantic cited an estimate that the median hourly wage on the service hovered around $2 per hour.
  • Self-taught AI. Advanced AI models can validate their own test data, to some extent. This is because they have multiple models, and thus a new model can be trained and verified based on previously tested models. If the new model performs better than older models, it replaces them, thus improving the quality of the system as a whole. These sorts of systems have been implemented by Google’s DeepMind, for instance, to generate boxes for Waymo AVs around pedestrians, bicyclists, and motorcyclists. While the system reduces the need for human verification, it does not eliminate it entirely, as DeepMind claimed “the best training regimen … is commonly achieved through an engineer’s experience and intuition.”

Interested in getting the full story? Here are three ways to get access:

  1. Sign up for the Transportation & Logistics Briefing to get it delivered to your inbox 4x a week. >> Get Started
  2. Subscribe to a Premium pass to business Insider Intelligence and gain immediate access to the Transportation & Logistics Briefing, plus more than 250 other expertly researched reports. As an added bonus, you’ll also gain access to all future reports and daily newsletters to ensure you stay ahead of the curve and benefit personally and professionally. >> Learn More Now
  3. Current subscribers can read the full briefing here.

BI Intelligence
BI Intelligence Content Marketing

Close icon
Two crossed lines that form an ‘X’. It indicates a way to close an interaction, or dismiss a notification.

Check mark icon
A check mark. It indicates a confirmation of your intended interaction.