Reach out directly about this role
Software Engineer (Data Pipeline)
About the team and project
We are developing a control system for humanoid robots. It must work here and now, solving real-world tasks — from warehouse logistics to complex service operations. We use cutting-edge technologies but are not limited by them. Where solutions don't exist — we create them from scratch. Where they do exist — we bring them to practical application.
For robots to move stably, interact with the environment, and perform useful tasks in real-world conditions, a lot of data is needed. For this data to be useful, a stable and reproducible processing pipeline is crucial. You will build reliable pipelines for delivering raw data from robots to storage and transforming it into ready-to-use datasets for model training.
Build reliable data delivery and processing pipelines You will design and implement the data flow to storage and its processing for use by machine learning teams. It is necessary to ensure the system can withstand volume growth, has protection against data loss, and is fault-tolerant.
Create a pipeline for converting data into target formats for ML You will need to create converters that transform raw data from robots into formats suitable for model training. You will describe schemas and transformation rules, bring heterogeneous sources to a uniform representation, and configure repeatable dataset assembly procedures.
Ensure the reliability and transparency of the process You will need to achieve stable and reproducible operation of the data processing process, resilient to growth in volume and load. You will ensure transparency at all stages — from data ingestion to the formation of final datasets — so that the status and correctness are clear to the team at any moment.
Integrate processes with validation and labeling tools You will build automated quality checks and data routing for manual review and labeling, ensuring consistency of artifacts throughout the entire data flow path.
More about backend at Yandex — in the channel Yandex for Backend
3-5 years
Experience
Full-time
Employment
Hybrid, Onsite
Work Format
Middle
Grade
Data Engineering
Specialization
Robotics
Industry
Corporation
Company Type
By city
Data Engineering
Specialization
Robotics
Industry
Corporation
Company Type