Senior Data Engineer / ML Engineer (GigaChat)

Contacts

Reach out directly about this role

Description

Our team is responsible for the quality of Text-to-speech models in GigaChat – the part of the neural network that handles audio.

We are currently looking for employees who will work on core quality, multimodal GigaChat, and other exciting projects.

Responsibilities

60% data engineer, 25% developer, 15% ML
writing audio/video data processing pipelines
preparing datasets for training and fine-tuning LLMs
integrating with ml/llm pipelines and the backend
implementing new solutions/frameworks/tools in the data work area
improving the reliability and scalability of data processes
inferring open-source and internal models on GPU (denoiser, recognition, audio quality assessment tools)
supporting the data storage.

Requirements

python and algorithms at a good level + OOP
experience with multithreading and multiprocess
experience developing production services and data pipelines
understanding of the LLM lifecycle
understanding of approaches to data quality control.
S3 (important), dvc.

Conditions

hybrid or remote work format (from Moscow)
annual salary review and annual bonus
corporate gym and relaxation areas
more than 400 educational programs from SberUniversity for professional and career development
extended VHI, preferential family insurance, and corporate pension program
mortgage for employees up to 7% more favorable
free SberPrime+ subscription, discounts on partner company products