Description
Our team is responsible for the quality of Text-to-speech models in GigaChat – the part of the neural network that handles audio.
We are currently looking for employees who will work on core quality, multimodal GigaChat, and other exciting projects.
Responsibilities
- 60% data engineer, 25% developer, 15% ML
- writing audio/video data processing pipelines
- preparing datasets for training and fine-tuning LLMs
- integrating with ml/llm pipelines and the backend
- implementing new solutions/frameworks/tools in the data work area
- improving the reliability and scalability of data processes
- inferring open-source and internal models on GPU (denoiser, recognition, audio quality assessment tools)
- supporting the data storage.
Requirements
- python and algorithms at a good level + OOP
- experience with multithreading and multiprocess
- experience developing production services and data pipelines
- understanding of the LLM lifecycle
- understanding of approaches to data quality control.
- S3 (important), dvc.
Conditions
- hybrid or remote work format (from Moscow)
- annual salary review and annual bonus
- corporate gym and relaxation areas
- more than 400 educational programs from SberUniversity for professional and career development
- extended VHI, preferential family insurance, and corporate pension program
- mortgage for employees up to 7% more favorable
- free SberPrime+ subscription, discounts on partner company products