Reach out directly about this role
3 years
Experience
Full-time
Employment
Hybrid
Work Format
Senior
Grade
Data Science & ML
Specialization
Robotics
Industry
Corporation
Company Type
Senior ML Researcher/Engineer (World Models & RL) for the Delivery Robot Team
Yandex delivery robots are not just bold R&D, but a real functioning business. Our robots deliver thousands of orders daily, maneuvering in a complex, unstructured urban environment. We are actively growing and plan to scale the fleet to 20,000 robots by 2028.
We are now transitioning from a classic modular pipeline with rigid binding to HD maps and perception/prediction/planning modules to a full-fledged End-to-End (E2E) architecture based on World Models.
Our goal is to build a strong Embodied AI. For an RL agent to be able to adequately plan complex maneuvers in the real world, it needs a deep understanding of physics, causality, and object permanence. Training a policy directly from raw pixels is extremely sample-inefficient. Therefore, we are building a system where a 3D/video tokenizer compresses the world, and a large-scale World Model learns to predict its latent dynamics. Within this generated simulation, we will train our planning policy using RL.
We are looking for a Senior ML Engineer/Researcher to join the core WM + E2E team, who will focus on building a fast, interactive world model and large-scale training of MBRL agents. Your research ideas will guide thousands of physical agents on city streets every day. If you are ready to solve fundamental robotics problems at the intersection of generative video models and RL — join us!
Development and Scaling of World Models You will design and train massive 3D/video tokenizers and backbones based on Diffusion Transformers (DiT), Flow Matching, etc. The goal is accurate prediction of the evolution of the physical world in latent space in response to the agent's actions.
Distributed Training You will build pipelines for distributed training of heavy foundation models on our computing cluster. You will work with Data-, Tensor- and Pipeline-parallelism, orchestrate multi-node training, and squeeze the absolute maximum out of the hardware.
Model-Based RL (MBRL) & Planning Your task will be training pure RL and IL + RL policies within the frozen latent simulation of the World Model, using dense self-supervised representations to train a reward model with high sample efficiency.
Representation Shaping You will work on integrating auxiliary losses for perception tasks like 3D detection, segmentation, and tracking for explicit semantic grounding of important scene objects.
Safety & Inference You will build a reliable safety framework on top of the model outputs and prepare the entire construct for real-time inference directly on the robot's edge devices.
More about ML at Yandex — in the Yandex for ML channel