Analyst-Developer for Autonomous Vehicles

We are the data mining group for autonomous vehicles — we measure the quality of our technologies. Our main task is to collect datasets for all vehicle components, from location localization to motion planning. But why isn't this just a simple train_test_split?

Two main reasons:

The vast majority of driving scenarios are "repetitive" and "boring." But as technologies improve, we want to be able to distinguish increasingly finer changes, because each next step becomes harder, though it carries greater weight. One way to solve this problem is to artificially increase the proportion of rare scenarios, while not losing the ability to look at random intervals. The basic idea is similar to Active Learning, which we also plan to actively implement. By the way, defining "repetitiveness" and "interestingness" is another one of our tasks.
There are five components in an autonomous vehicle responsible for autonomy. There are also many adjacent tasks that are not focused on autonomy at runtime, but their quality also needs to be measured. For example, the quality of driving simulation. To correctly determine the direction of development and help teams adjust it, we collect datasets separately for each major task.

What tasks await you

Data Mining Our group's main task is to help teams improve technology quality. We collect datasets for training and quality measurement, find growth points and devise scaling methods, actively use LLMs, VLMs, open-source and proprietary models. This is a creative task that requires a love for digging into data and good communication skills: to understand what adjacent teams want and to explain what they actually need.

Determining "interestingness" and "repetitiveness" of segments from real (and simulated) drives Not all segments are equally useful, and running simulations on identical trips is a complete waste of resources. For similar inputs, we get similar outputs and no useful information. By correctly deduplicating and weighting trips, we can save computational and time resources, meaning we can simulate more data!

An important nuance: "interestingness" and "repetitiveness" depend on the component. For example, localization pays attention to the location and objects around the autonomous vehicle, while perception focuses on obstacles and agents, trying to predict their behavior and impact on the vehicle.

More about analytics at Yandex — in the channel Yandex for Analytics

We expect you to

Have strong proficiency in Python and SQL
Have good algorithmic background. We don't solve LeetCode-like problems every day at work (though it happens), but algorithmic thinking will help you correctly write and test tricky code
Know mathematical statistics and probability theory
Be able to find common solutions and points of convergence, convey ideas, build and visualize informative graphs

Will be a plus

Respect PEP 8 and clean code
Familiarity with Airflow, Hadoop, Spark, CH — all of this will help you integrate faster into Yandex's technological environment

Contacts

What tasks await you

We expect you to

Will be a plus

Similar vacancies

Analyst-Developer for Autonomous Transport

Analyst-Developer for the Vehicles and Fleets Analytics Team

Analyst-Developer for the Yandex Crowd Product Analytics Team

Analyst-Developer for the Elektro Team

Analyst-Developer for the Yandex AI team

Analyst-Developer of Offline Metrics at Yandex Pictures

Analyst for the Batch Technology Development Team in Delivery

Analyst-Developer for the Perception Annotation Team

Tech Lead of the Analytics Group for the Alisa Team

Lead of Offline Metrics Analytics at Yandex Pictures

Analyst-Developer for the Delivery Robot Product Analytics Team

Data Analyst for the Payment Analytics Group

Analyst-Developer for Autonomous Vehicles

Key Skills

Details

Average salary for this role