Description
We are the core team responsible for machine learning for understanding audio data across Sber. Last year, we open-sourced a SOTA model for Russian speech recognition, GigaAM (https://arxiv.org/abs/2506.01192), and this spring, we were the first in Russia to launch native audio understanding in an LLM: GigaChat Audio (https://habr.com/ru/companies/sberdevices/articles/904894/). Currently, we are actively working on improving GigaChat's multimodal capabilities: improving quality on complex contexts from audio and images; understanding video not only via the audio stream but also via frames.
Responsibilities
- creating a pipeline for generating synthetic Audio+Vision+Text data from internal and open models
- creating benchmarks: llm-as-a-judge, auto-metrics
- conducting experiments on LLM training: testing data and training stages, modality mixing methods
Requirements
- python: modular code, OOP, concurrency, pep, tests
- understanding of training stages and modern LLM architectures
- understanding of methods for evaluating the quality of ML systems
- deep theoretical knowledge in DL
- experience with debugging/training in multi-gpu mode
Would be a plus
- experience in Computer Vision / Audio
Conditions
- comfortable modern office near Kutuzovskaya metro station
- opportunity to choose a convenient schedule – office/hybrid (Moscow / Saint Petersburg offices)
- annual salary review and annual bonus
- corporate gym and recreation areas
- more than 400 educational programs from SberUniversity for professional and career development
- extended voluntary health insurance, preferential insurance for family members
- flexible mortgage discount equal to 1/3 of the Central Bank key rate
- free subscription to SberPrime+, discounts on products from partner companies
- referral bonus for recommending friends to the Sber team