Description

We are looking for a Middle ML Engineer/Researcher to join a research team working on the development of omnimodal solutions in the audio domain, as part of a large-scale project to create next-generation artificial intelligence systems.

Responsibilities

● Researching technologies for working with different audio modalities: speech, noise, music, sound effects

● Integrating audio, text, and visual modalities into a unified architecture

● Multimodal reasoning and stream synchronization (audio–text–vision)

● Researching and implementing state-of-the-art approaches (end-to-end, transformers, multimodal LLMs, diffusions)

Requirements

Excellent Python 3, experience with PyTorch, bash, git, Docker, dvc, HF Transformers
Good understanding of ASR, TTS, DSP ML, speech & audio processing
Understanding of transformers, attention mechanisms, KV-cache, diffusion
Skills in working with large audio datasets
Understanding of MLOps practices: model monitoring, data drift, CI/CD

Will be a plus:

Experience in speech, music domains, with voice assistants
Experience with diffusion and autoregressive architectures for audio/music
Experience with streaming / real-time systems
Knowledge of multimodal LLM / VLM / Audio-LM
Publications or research background in relevant fields

Conditions

Comfortable modern office near Kutuzovskaya metro station
Hybrid work format
Annual salary review, annual bonus from 3 salaries
Large gym and recreation areas
Training system for professional and career development
Extended voluntary health insurance policy from the first day of work and insurance for family
Employee mortgage program with a discount of -1/3 from the current rate
Free SberPrime+ subscription, discounts on products from partner companies
Referral bonus for recommending friends to the Sber team.

Contacts

Description

Responsibilities

Requirements

Conditions

Similar vacancies

MLOps Audio

MLE (Middle/Senior)

Senior ML Engineer (NLP, GigaChat Audio)

Team Lead ML TTS GigaChat Data

Data Scientist (Sberspace)

Middle ML Engineer

Senior ML Engineer (Text-to-Speech)

ML Developer for Voice Input Applications Team

ML Engineer

ML Engineer (GigaChat Data)

Data Scientist Middle+

Middle/Senior Data Scientist LLM (B2C team)

Middle ML Researcher (Audio)

Key Skills

Details

Average salary for this role