Reach out directly about this role
5 years
Experience
Full-time
Employment
Hybrid, Remote, Onsite
Work Format
Middle
Grade
System Engineering
Specialization
IT & Tech
Industry
Corporation
Company Type
GPU Performance Engineer
We manage one of the company's scarcest and most expensive resources — graphics processing units (GPUs). Their efficient use is critical for the operation of Yandex's key services: Search, Advertising, Alice, Taxi, Music, and other AI-based products. Our mission is to ensure maximum output and effect from every GPU card. This is not just resource administration, but a strategic role at the intersection of technology and business.
We are looking for a GPU Performance Engineer who will help grow GPU utilization efficiency, squeeze maximum performance from GPU computations, and make our systems fast, scalable, and resilient under high load.
The team works with 150+ products where GPUs are the foundation for AI models. You will become the link between engineering teams and top management, turning technical solutions into direct financial benefit.
You will join a team that directly influences the efficiency of Yandex's key products. We have no bureaucracy — decisions are made quickly, and initiatives are welcomed. Ideas on how to improve GPU usage efficiency are especially valued now.
We combine technical expertise with business orientation. For example, we recently launched a system for redistributing GPUs between teams, taking into account the development strategy of each individual service and the overall company strategy. This initiative saved the company hundreds of millions of rubles and provided a boost for focus areas.
Plans include creating a single standard for GPU usage for all Yandex services with a focus on increasing usage efficiency and maximizing the volume of profit obtained.
Improving GPU utilization efficiency You will formulate hypotheses and research ways to improve GPU utilization efficiency, participate in implementing and deploying the most profitable solutions. It will be necessary to formulate recommendations and best practices for improving performance to squeeze the maximum out of the GPU infrastructure.
Optimization and profiling Your responsibilities will include finding performance bottlenecks and eliminating them using profilers, as well as optimizing memory access, kernels, latency, and throughput.
Developing diagnostic tools You will create and improve tools for quickly identifying and eliminating infrastructure problems that affect the efficiency of utilization, stability, and speed of GPU computations (for both training and inference).
Research and implementation of modern solutions You will explore the latest approaches to organizing infrastructure for training and inference, evaluate their effectiveness, and implement them in real projects.
Architecture analysis, testing, integration You will work closely with developers, ML engineers, and system architects. You will participate in evaluating hardware solutions and propose improvements for future GPU generations, as well as develop testing plans, form benchmarks, and conduct performance regression analysis.
More about ML at Yandex — in the channel Yandex for ML