Reach out directly about this role
Our team is working on high-quality speech synthesis for all Yandex products. This includes, for example, video translation and voice-over in the Browser, Bookmate audiobooks, Alice, and geo-products. We are looking for a colleague who wants to improve the intonation of synthesized speech together with us.
Training TTS models for Alice and Bookmate The higher the quality of speech synthesis, the more comfortable it is for the user. If the synthesis is monotonous and emotionless — the user will not want to listen to an audiobook or talk to a voice assistant. Therefore, we are improving intonation and implementing emotions into the synthesis. You will conduct a lot of research work and train SOTA models.
Synthesis prompting Many datasets are now appearing that contain not only audio and text but also a prompt describing the pronunciation style. For example, 'fast reading in a high-pitched female voice with expressive pauses.' Your task is to prompt the synthesis. To do this, you will need to implement many modern approaches and generate new ideas.
Dataset generation Many prompt datasets are generated synthetically. It is necessary to develop pipelines consisting of many neural networks (and if they are insufficient — train them from scratch or fine-tune them) that will help collect such datasets.
3 years
Experience
Full-time
Employment
Data Science & ML
Specialization
IT & Tech
Industry
Corporation
Company Type
By job title
Corporation
Company Type