Description
This vacancy is a participant in the pilot for using AI.
After applying, you will receive an email invitation to complete an initial interview with GigRecruiter on Telegram. The conversation will take approximately 10 minutes. Its goal is to clarify missing details and expedite the consideration of your application.
GigRecruiter is just starting its journey, so we ask for your understanding. Your experience and participation will help make it convenient and useful.
Give it a try — and you will be one of the first to meet Sber's GigRecruiter!
We at GigaChat are developing the core technology of a generative language model: it writes texts, generates images, writes code, answers questions, and conducts dialogues.
Last fall, we released a powerful Russian-language GigaChat MAX model at the GPT-4 level (metrics are in the article on Habr). In winter, we made one of our MoE models publicly available. And we haven't stopped there — we're moving forward!
Currently, our focus is in several directions:
Research and Experiments.
- Stabilization and development of architectures (new types of attention, improving the test bench for changes).
- Methods for improving training (optimization, losses and modes like FIM/MTP, MoE balancing).
- Scaling laws (for hyperparameters, quality, and cost).
- Constant review of recent industry articles and ideas.
Infrastructure and Parallel Training.
- 5-D parallelism, acceleration of multimodal and giant MoE models on large clusters.
Distributed Training Framework.
- Development of GigaFSDP, experiments with FP8/mixed-precision, stability and efficiency of training at large scales.
Low-Level Optimizations.
- Optimization of operations at the CUDA/triton kernel level, improving NCCL performance, profiling, and eliminating bottlenecks.
Quality and Metrics.
- Development of GigaChat evaluation: from international olympiad tasks to metrics specific to the Russian language.
We are looking for an NLP Engineer with whom we will make GigaChat smarter. For experiments, we have a cluster with a large number of A100/H100 GPUs.
Responsibilities
- Bring the quality in Russian to the level of ChatGPT and above.
- Come up with and test ideas that provide practical gains.
- Help solve tasks for Sber's internal clients — with an eye on external users.
- Follow the industry: read articles, quickly test hypotheses, share results.
Requirements
- Confident Python and PyTorch skills.
- Strong foundation in algorithms and mathematics (linear algebra, optimization, probabilities).
- Experience in training DL models: from "just models" to large ones.
- Theoretical understanding of distributed training algorithms.
- Understanding of the current LLM landscape and trends.
A plus would be:
- Experience with distributed training (DDP/FSDP/parallelism), CUDA/NCCL/profiling, MoE/FP8, multimodal models, building quality metrics.
Even if you don't have experience with LLMs, but have extensively worked on NLP research or engineering optimizations — don't hesitate to apply!
Conditions
- Remote within Russia.
- Possibility of employment with an accredited IT company.
- Annual performance bonus of up to 6 monthly salaries.
- Regular salary reviews.
- Corporate gym and relaxation areas.
- More than 400 SberUniversity programs for growth.
- Onboarding program and manager support at the start.
- Largest DS&AI community — over 600 DS professionals from the bank, regular exchange of knowledge, experience and best practices, interactive lectures and master classes from leading universities and experts from tech companies, digest of the latest developments in DS&AI and reports from world's largest conferences, regular internal meetups.
- Extended health insurance, preferential family insurance, corporate pension program.
- Employee mortgage under a discount program.
- SberPrime+ and discounts with partners.
- Referral bonus for bringing talent to the team.