Reach out directly about this role
ML Engineer for Speech Synthesis Pretraining Team
We are building a universal multilingual pretraining model for TTS that will become the foundation for all our products: from voice assistants and audiobooks to voice cloning and its preservation in video translation and dubbing.
One of the team's key tasks right now is the development of a new neural audio codec. Everything depends on it: the speed and cost of training, the range of intonations and emotions, and the clarity and naturalness of sound in the final products.
Developing a semantic audio codec You will design, train from scratch, and scale modern neural codec architectures. The goal is to achieve maximum compression with minimal loss of quality and semantic information.
Working with big data You will have to train models on hundreds of thousands of hours of multilingual speech.
Training and scaling large models You will train large (multi-billion parameter) models and conduct experiments with them on our GPU clusters using distributed training.
Evaluation and metrics You will need to devise and implement automatic and expert metrics to evaluate sound quality (clarity, artifacts), speech expressiveness, and the semantic preservation of the codec and pretrain outputs.
End-to-end development cycle You will go through the full path from research (reading papers, proof-of-concept) to testing your solutions in real products.
More about ML at Yandex — in the channel Yandex for ML
If you want to build the technological foundation for the future of speech synthesis and see the results of your work in products used daily by millions of people — join the team!
3 years
Experience
Full-time
Employment
Data Science & ML
Specialization
IT & Tech
Industry
Corporation
Company Type
By job title
Corporation
Company Type