Description

We develop high-performance CUDA operators for PyTorch, enabling the training and inference of multimodal models with maximum GPU resource utilization. The focus is on low-level optimization, custom kernels, memory management, and efficient work with new GPU architectures.

Responsibilities

Development and optimization of custom CUDA operators and extensions for PyTorch (C++/CUDA).
Profiling and elimination of bottlenecks in computational kernels (Nsight Compute, nvprof).
Optimization of memory usage (shared memory, registers, coalesced access, persistent kernels).
Implementation of parallel computing algorithms considering architectural features of modern GPUs (Ampere, Hopper and newer).
Integration of CUDA optimizations into distributed training and inference pipelines.
Close collaboration with Research and Distributed Learning teams to support custom models and operators.

Requirements

Expert-level C++ and CUDA.
Experience in performance optimization for NVIDIA GPUs.
Knowledge of PyTorch internals (ATen, dispatcher, TensorIterator).
Skills in GPU profiling and identifying/eliminating bottlenecks in neural network operator implementation.
Experience with Mixed Precision and custom kernels.

Bonus: Experience with Triton, CUTLASS, cuBLASLt, NCCL; participation in PyTorch open-source projects.

Conditions

Comfortable modern office near Kutuzovskaya metro station
Hybrid work format
Annual salary review, quarterly and annual bonus
Corporate gym and recreation areas
More than 400 educational programs from SberUniversity for professional and career development
Onboarding program and manager assistance at the start
Extended voluntary health insurance, preferential insurance for family, and corporate pension program
Mortgage with benefits up to 7% for every employee
Free SberPrime+ subscription, discounts on products from partner companies
Referral bonus for recommending friends to join the Sber team

Contacts

Description

Responsibilities

Requirements

Conditions

Similar vacancies

Senior CUDA Engineer (Kandinsky)

C++ Developer (VLLM, SGlang, TensorRT)

Senior DL/LLM Engineer (Pretrain/RL Efficiency)

GPU Performance Engineer

Senior Deep Learning Research Engineer

LLM Platform Engineer (ML Engineer)

Senior Deep Learning Research Engineer (Diffusion Models)

Senior NLP Engineer (GigaChat)

NLP Engineer (GigaChat Pretrain)

Senior Developer for the GPU Infrastructure Team

LLM Infrastructure Developer

C++ Inference Server Developer for the ML Infrastructure Department

Senior CUDA Engineer (Kandinsky)

Key Skills

Details