Reach out directly about this role
By country
5 years
Experience
Full-time
Employment
Onsite, Hybrid
Work Format
Middle
Grade
Data Engineering
Specialization
IT & Tech
Industry
Corporation
Company Type
Logos Data Platform Developer
Our team develops the internal Logos data management platform. With it, Yandex's ML engineers, data engineers, and analysts build ETL data processing pipelines, data warehouses for analytics and reporting, assemble datasets for machine learning, and train models. Our platform manages about 10 thousand regular processes that process tens and hundreds of petabytes of data every day. Among our users are Yandex's Advertising, Market, Kinopoisk, Music, Plus, and many other Yandex services.
We help our users focus on the content of the processed data and extract value from it, while we take care of the infrastructure, pipeline orchestration, and the reliability of their operation. We are looking for a colleague who will help us develop the platform, using the full power of Yandex's data storage and processing systems.
You can learn more about us from articles and videos: * How we test data pipelines in Yandex Advertising (video here) * Experience building a DMP in Yandex Advertising * How to regularly build more and more ML pools on MapReduce, while being on call less and less
Development of an ETL framework Our users create their own data processing pipelines in Python using the framework we develop. You will have to expand its capabilities, make it more flexible, and open up new use cases. At the same time, you will have tasks to simplify basic scenarios and reduce the amount of boilerplate code that users have to write, in order to reduce the effort for developing and maintaining data processing processes.
Development of the data processing platform Among our users are major Yandex services such as Advertising, Market, Plus, and Fantech. They all use Yandex's common infrastructure, while the data and processes of each service have their own specifics. Our task is to reduce the cost and shorten the implementation time of common solutions, as well as to spread best practices, while maintaining maximum flexibility to account for the specifics of each service. You will be able to participate in the development of common tools for data management, for example, data quality tools, and work on supporting various data storage and processing systems within the data platform.
Improving service reliability Our platform manages tens of petabytes of data and thousands of processes belonging to dozens of teams from more than a dozen different services. At such scales, users need tools that allow them to test their pipelines within CI/CD and ensure uninterrupted operation of processes in production. Your tasks will include developing tools for monitoring problems in production and responding to them, implementing Yandex's best practices for ensuring the reliability of high-load services, so that our users can independently maintain their pipelines.
More about backend at Yandex — in the channel Yandex for Backend