ML Engineer in the Speech Synthesis Data Team

We are looking for an experienced data and ML engineer for the speech synthesis data team. The team is engaged in video translation, creates audiobooks, and develops the voice of Alice. In synthesis, an era has begun of transitioning from low resource (even for major languages) to big data and pre-training. New models allow you to sing famous songs in your voice and speak any phrase using just a few seconds of your voice. The foundation of quality for these models is hundreds of thousands of hours of high-quality audio data and corresponding texts, which we need to collect.

What tasks await you

Working with data You will be developing a system for storing truly big data and providing access to it for ML developers. At your disposal will be petabytes of audio, which need to be stored efficiently and processed quickly.

Data mining You will be improving the throughput of current data collection pipelines and scaling them to support multiple languages, working with heterogeneous sources, and developing processes for mining audio data.

Data quality assessment You will be working with processes for assessing data parameters, developing and applying ML models for detecting noise, music, multiple voices, synthetic speech, text-audio mismatches, and language detection. These assessments will allow us to filter data and make our synthesis the best in the world.

Learn more about ML at Yandex — on the channel Yandex for ML

We expect that you

Write in Python
Have built data collection pipelines for ML
Understand how to work with large volumes of data
Have applied in practice, or better yet, trained ML models
Are motivated and ready to dive deep into the field

Contacts

What tasks await you

We expect that you

Similar vacancies

Data Scientist for Yandex Cloud ML Services

ML Developer for Voice Input Applications Team

ML Developer for the Voice Quality Enhancement Team at Alice

ML Developer for Voice Quality Improvement Team

Senior ML Engineer (Text-to-Speech)

ML Engineer for Speech Synthesis Pretraining Team

ML Developer for Retail Risks at Yandex Bank

ML Developer for International Advertising

ML Developer for the Machine Learning Quality Group of the e-com Content System

Team Lead Data Scientist for Customer Service at Crowd

ML Developer for Generative Ecom Scenarios (LLM) Team

ML Engineer, Maps

ML Engineer in the Speech Synthesis Data Team

Key Skills

Details

Average salary for this role