Reach out directly about this role
ML Developer for Inference Acceleration Team
YandexGPT is increasingly integrated into the company's services and solves a wide variety of tasks, providing value to people. Each implementation presents developers with unique challenges related to the quality and speed of ML model performance. However, one thing remains constant for every deployment: model inference in production is very expensive. Depending on the audience and load, a service may require from tens to thousands of the most modern GPUs. Optimizing even tens of percent of resources at such scale already represents significant value.
You can read more about the general approach to inference acceleration, as well as the methods used, in the post on Habr "Accelerating LLM Inference".
We are looking for a research engineer with experience in reading and implementing research papers, ready to experiment and apply inference acceleration methods for modern and rapidly evolving LLM architectures.
Continuous analysis of research papers First and foremost, you will need to deeply familiarize yourself with a series of articles on the topic (more than 20 publications), systematize them, and identify the most promising ones.
Applying methods for YandexGPT You will need to conduct numerous iterations of experiments to test hypotheses for YandexGPT in order to move on to generating and implementing new approaches. You will also need to confirm the practical applicability of the methods: measure quality and acceleration.
Developing universal tools Finally, you will need to create a common solution that will be reused by ML engineers across all of Yandex.
3-5 years
Experience
Full-time
Employment
Hybrid, Remote, Onsite
Work Format
Data Science & ML
Specialization
IT & Tech
Industry
Corporation
Company Type
By job title
IT & Tech
Industry
Corporation
Company Type