Description

We develop and implement advanced methods for optimizing the training and inference of ultra-large neural networks (tens of billions of parameters) for multimodal generative models. The focus is on compilation, quantization, distillation, sparsity, and other acceleration techniques, without compromising quality.

Responsibilities

research and implementation of training optimization methods (gradient checkpointing, activation recomputation, mixed-precision, computational graph optimization)
development and integration of inference acceleration techniques: quantization (INT8, FP8), pruning, structured sparsity, knowledge distillation
use and modification of ML compilers (TorchDynamo, TorchInductor, TensorRT, and others) to optimize computational graphs
collaboration with CUDA operators and Distributed Learning teams to ensure maximum GPU performance
design and conduct experiments on model compression and comparative analysis of speed/quality trade-offs.

Requirements

expert-level Python, PyTorch
experience with ML compilers and optimization of inference and training
deep understanding of quantization, distillation, and sparsification methods
skills in performance profiling and optimization (PyTorch Profiler, Nsight Systems, perf)
understanding of modern LLM and Diffusion model architectures

Bonus: Experience with CPU/ASIC/FPGA optimization, publications at NeurIPS/ICML/MLSys, knowledge of C++.

Conditions

comfortable modern office near Kutuzovskaya metro station
hybrid work format
annual salary review, quarterly and annual bonus
corporate gym and recreation areas
access to over 400 educational programs from SberUniversity for professional and career development
onboarding program and manager support at the start
extended VHI, preferential insurance for family, and corporate pension program
mortgage more favorable by up to 7% for every employee
free SberPrime+ subscription, discounts on products from partner companies
referral bonus for recommending friends to the Sber team.

Contacts

Description

Responsibilities

Requirements

Conditions

Similar vacancies

Senior Deep Learning Research Engineer

Senior Deep Learning Research Engineer (Diffusion models)

Senior DL/LLM Engineer (Pretrain/RL Efficiency)

Senior Research Engineer (Kandinsky)

Senior DL/GenAI Research Engineer (Diffusion Video Generation & World Model Development)

Senior Research Engineer (Multimodal Diffusion & RLHF)

LLM Platform Engineer (ML Engineer)

Senior LLM Researcher (Center for Applied Artificial Intelligence)

NLP Engineer (GigaChat Pretrain)

Senior DL/GenAI Research Engineer (Diffusion Video Generation)

Senior DL Developer for Neuro Team

Senior ML Engineer (Quantum Technology Center)

Senior Deep Learning Research Engineer (Diffusion Models)

Key Skills

Details

Average salary for this role