Description
We develop and implement advanced methods for optimizing the training and inference of ultra-large neural networks (tens of billions of parameters) for multimodal generative models. The focus is on compilation, quantization, distillation, sparsity, and other acceleration techniques, without compromising quality.
Responsibilities
- research and implementation of training optimization methods (gradient checkpointing, activation recomputation, mixed-precision, computational graph optimization)
- development and integration of inference acceleration techniques: quantization (INT8, FP8), pruning, structured sparsity, knowledge distillation
- use and modification of ML compilers (TorchDynamo, TorchInductor, TensorRT, and others) to optimize computational graphs
- collaboration with CUDA operators and Distributed Learning teams to ensure maximum GPU performance
- design and conduct experiments on model compression and comparative analysis of speed/quality trade-offs.
Requirements
- expert-level Python, PyTorch
- experience with ML compilers and optimization of inference and training
- deep understanding of quantization, distillation, and sparsification methods
- skills in performance profiling and optimization (PyTorch Profiler, Nsight Systems, perf)
- understanding of modern LLM and Diffusion model architectures
Bonus: Experience with CPU/ASIC/FPGA optimization, publications at NeurIPS/ICML/MLSys, knowledge of C++.
Conditions
- comfortable modern office near Kutuzovskaya metro station
- hybrid work format
- annual salary review, quarterly and annual bonus
- corporate gym and recreation areas
- access to over 400 educational programs from SberUniversity for professional and career development
- onboarding program and manager support at the start
- extended VHI, preferential insurance for family, and corporate pension program
- mortgage more favorable by up to 7% for every employee
- free SberPrime+ subscription, discounts on products from partner companies
- referral bonus for recommending friends to the Sber team.