Description

We develop and implement advanced optimization methods for the training and inference of extremely large neural networks (tens of billions of parameters) for multimodal generative models. The focus is on compilation, quantization, distillation, sparsity, and other acceleration techniques, without compromising quality.

Responsibilities

research and implementation of training optimization methods (gradient checkpointing, activation recomputation, mixed-precision, computational graph optimization)
development and integration of inference acceleration techniques: quantization (INT8, FP8), pruning, structured sparsity, knowledge distillation
use and modification of ML compilers (TorchDynamo, TorchInductor, TensorRT, and others) for optimizing computational graphs
collaboration with CUDA operators and Distributed Learning teams to ensure maximum GPU performance
design and execution of model compression experiments and comparative analysis of speed/quality trade-offs.

Requirements

expert-level Python, PyTorch
experience with ML compilers and optimization of inference and training
deep understanding of quantization, distillation, and sparsification methods
skills in performance profiling and optimization (PyTorch Profiler, Nsight Systems, perf)
understanding of modern LLM and Diffusion model architectures

Bonus: Experience in optimization for CPU/ASIC/FPGA, publications at NeurIPS/ICML/MLSys, knowledge of C++.

Conditions

comfortable modern office near Kutuzovskaya metro station
hybrid work format
annual salary review, quarterly and annual bonus
corporate gym and recreation areas
more than 400 educational programs from SberUniversity for professional and career development
adaptation program and manager's assistance at the start
extended private health insurance, preferential insurance for family, and corporate pension program
mortgage rates more favorable by up to 7% for every employee
free SberPrime+ subscription, discounts on partner company products
referral bonus for recommending friends to the Sber team

Contacts

Description

Responsibilities

Requirements

Conditions

Similar vacancies

Senior Deep Learning Research Engineer (Diffusion Models)

Senior DL/LLM Engineer (Pretrain/RL Efficiency)

LLM Platform Engineer (ML Engineer)

Data Scientist / Deep Learning Engineer (Recommender Systems)

Senior ML Engineer (Quantum Technology Center)

Senior Research Engineer (Kandinsky)

Middle Research Engineer (AI Algorithms & Architectures)

Senior LLM Researcher (Center for Applied Artificial Intelligence)

Senior Deep Learning Research Engineer (Diffusion models)

Senior DL/GenAI Research Engineer (Diffusion Video Generation & World Model Development)

NLP Engineer (GigaChat Pretrain)

Deep Learning Engineer (GigaChat Prod)

Senior Deep Learning Research Engineer

Key Skills

Details

Average salary for this role