Logos Data Platform Developer

Our team develops the internal Logos data management platform. With it, Yandex's ML engineers, data engineers, and analysts build ETL data processing pipelines, data warehouses for analytics and reporting, assemble datasets for machine learning, and train models. Our platform manages about 10 thousand regular processes that process tens and hundreds of petabytes of data every day. Among our users are Yandex's Advertising, Market, Kinopoisk, Music, Plus, and many other Yandex services.

We help our users focus on the content of the processed data and extract value from it, while we take care of the infrastructure, pipeline orchestration, and the reliability of their operation. We are looking for a colleague who will help us develop the platform, using the full power of Yandex's data storage and processing systems.

You can learn more about us from articles and videos: * How we test data pipelines in Yandex Advertising (video here) * Experience building a DMP in Yandex Advertising * How to regularly build more and more ML pools on MapReduce, while being on call less and less

What tasks await you

Development of an ETL framework Our users create their own data processing pipelines in Python using the framework we develop. You will have to expand its capabilities, make it more flexible, and open up new use cases. At the same time, you will have tasks to simplify basic scenarios and reduce the amount of boilerplate code that users have to write, in order to reduce the effort for developing and maintaining data processing processes.

Development of the data processing platform Among our users are major Yandex services such as Advertising, Market, Plus, and Fantech. They all use Yandex's common infrastructure, while the data and processes of each service have their own specifics. Our task is to reduce the cost and shorten the implementation time of common solutions, as well as to spread best practices, while maintaining maximum flexibility to account for the specifics of each service. You will be able to participate in the development of common tools for data management, for example, data quality tools, and work on supporting various data storage and processing systems within the data platform.

Improving service reliability Our platform manages tens of petabytes of data and thousands of processes belonging to dozens of teams from more than a dozen different services. At such scales, users need tools that allow them to test their pipelines within CI/CD and ensure uninterrupted operation of processes in production. Your tasks will include developing tools for monitoring problems in production and responding to them, implementing Yandex's best practices for ensuring the reliability of high-load services, so that our users can independently maintain their pipelines.

More about backend at Yandex — in the channel Yandex for Backend

We expect you to

Have excellent knowledge of Python
Know the basic principles and patterns of software design
Strive to write code that is easy to read and maintain
Have worked with relational and non-relational databases and understand their structure

It will be a plus if you

Have written code in C++ and Golang
Have developed and maintained libraries in Python
Understand how fault-tolerant distributed systems are built
Have been involved in big data processing, worked with DWH data warehouses
Are familiar with ETL frameworks such as Airflow, Luigi, Dagster, and others

What tasks await you

More about backend at Yandex — in the channel Yandex for Backend

Key Skills

Contacts

Average salary for this role

Details

What tasks await you

We expect you to

It will be a plus if you

Similar vacancies

Data Engineer (DevEx)

Data Engineer at Edadil

Big Data Engineer (NRT/Spark)

Data Engineer in Fintech

Data Engineer for Yandex 360

Data Engineer (Corporate Collections)

Data Engineer (Kandinsky)

Software Engineer (Data Pipeline)

Data Engineer

Data Engineer

Data Engineer

Data Analyst

Key Skills

Contacts

Average salary for this role

Details

What tasks await you

We expect you to

It will be a plus if you

Similar vacancies

Data Engineer (DevEx)

Data Engineer at Edadil

Big Data Engineer (NRT/Spark)

Data Engineer in Fintech

Data Engineer for Yandex 360

Data Engineer (Corporate Collections)

Data Engineer (Kandinsky)

Software Engineer (Data Pipeline)

Data Engineer

Data Engineer

Data Engineer

Data Analyst