Description
We are a core team responsible for machine learning for audio data understanding across Sber. Last year, we released the SOTA model for Russian speech recognition, GigaAM (https://arxiv.org/abs/2506.01192), as open source, and this spring we were the first in Russia to launch native audio understanding in an LLM: GigaChat Audio (https://habr.com/ru/companies/sberdevices/articles/904894/). We are now actively working on improving GigaChat's multimodal properties: improving quality on complex contexts from audio and images; understanding video not only via the audio stream but also via frames.
Responsibilities
- creating a pipeline for generating synthetic Audio+Vision+Text data from internal and open models
- creating benchmarks: llm-as-a-judge, auto-metrics
- conducting LLM training experiments: testing data and training stages, modality mixing methods
Requirements
- python: modular code, OOP, concurrency, pep, tests
- understanding of LLM training stages and modern architectures
- understanding of ML system quality assessment methods
- deep theoretical knowledge in DL
- experience with debugging/training in multi-gpu mode
Will be a plus
- experience in Computer Vision / Audio
Conditions
- comfortable modern office near Kutuzovskaya metro station
- ability to choose a convenient schedule – office/hybrid (offices in Moscow / St. Petersburg)
- annual salary review and annual bonus
- corporate gym and recreation areas
- more than 400 educational programs from SberUniversity for professional and career development
- extended voluntary health insurance, preferential insurance for family
- flexible mortgage discount equal to 1/3 of the Central Bank's key rate
- free SberPrime+ subscription, discounts on products from partner companies
- referral bonus for recommending friends to Sber's team