ML Engineer for Speech Synthesis Pretraining Team

We are building a universal multilingual pretraining model for TTS that will become the foundation for all our products: from voice assistants and audiobooks to voice cloning and its preservation in video translation and dubbing.

One of the team's key tasks right now is the development of a new neural audio codec. Everything depends on it: the speed and cost of training, the range of intonations and emotions, and the clarity and naturalness of sound in the final products.

What tasks await you

Developing a semantic audio codec You will design, train from scratch, and scale modern neural codec architectures. The goal is to achieve maximum compression with minimal loss of quality and semantic information.

Working with big data You will have to train models on hundreds of thousands of hours of multilingual speech.

Training and scaling large models You will train large (multi-billion parameter) models and conduct experiments with them on our GPU clusters using distributed training.

Evaluation and metrics You will need to devise and implement automatic and expert metrics to evaluate sound quality (clarity, artifacts), speech expressiveness, and the semantic preservation of the codec and pretrain outputs.

End-to-end development cycle You will go through the full path from research (reading papers, proof-of-concept) to testing your solutions in real products.

More about ML at Yandex — in the channel Yandex for ML

We expect you to have

Proficient in Python and PyTorch
Experience with the full cycle of training large models from scratch, preferably in NLP, audio, or multimodal domains
Possess a broad knowledge base in the field of NLP
Ready to dive into the field of speech synthesis to understand both the theory and the engineering details
Keep up with ML developments and know how to implement ideas from papers into code

Will be a plus

Experience in one of the areas: TTS/VC, neural audio codecs, training LLMs from scratch

If you want to build the technological foundation for the future of speech synthesis and see the results of your work in products used daily by millions of people — join the team!

Contacts

What tasks await you

We expect you to have

Will be a plus

Similar vacancies

ML Developer for the Speech Synthesis Team

ML Researcher Developer for the Alignment team of the Speech Synthesis Service

ML Research Engineer for Video Translation in the Browser

ML Developer for the Intonation Group

Senior ML Engineer (Text-to-Speech)

ML Developer for Voice Input Applications Team

ML Engineer in the Speech Synthesis Data Team

ML Developer for Voice Quality Improvement Team

Team Lead ML TTS GigaChat Data

Middle ML Researcher (Audio)

ML Developer for the Voice Quality Enhancement Team at Alice

Senior ML Engineer (Text-to-Speech)

ML Engineer for Speech Synthesis Pretraining Team

Key Skills

Details

Average salary for this role