Description

We are a team of ML engineers specializing in data infrastructure and model validation. Our team creates critical tools for evaluating model quality, develops data processing pipelines, and focuses on generating high-quality technical and code data for model training.

Our main areas:

• Validation & Metrics — validation and measurement tools for models

• Data Engineering — data cleaning and synthesis pipelines

• Code & Technical Data — generation of code/technical data

• MLOps — hosting and using open source models

• Model Training — experiments with LoRA and SFT.

If you are interested in creating infrastructure for LLM development and working with cutting-edge data technologies — join us.

Responsibilities

development of tools for validating and measuring model quality and data quality control systems
creation of metrics for assessing LLM performance and accuracy
automation of testing and benchmarking, filtering and preprocessing processes
building and optimizing data cleaning and synthesis pipelines
generation of high-quality code and technical datasets
creation of synthetic data for training models on technical tasks
supporting infrastructure for hosting open source models
integration and use of open source models in product solutions
training LoRA adapters for experimental tasks
conducting SFT training as part of data research
analysis of experiment results and interpretation of approaches.

Requirements

excellent knowledge of Python and experience with ML libraries (LangChain/LangGraph, PyTorch, llm-foundry, verl)
experience working with LLMs (both open source: Llama, Mistral, Qwen, and proprietary: GPT, Claude)
understanding of the principles of working with data for model training: collection, cleaning, validation
skills in building ML pipelines and process automation
understanding of processes, knowledge of approaches to validating and testing machine learning models
understanding of MLOps basics and working with containerization (Docker).

Will be a plus:

experience with agent and RAG frameworks
knowledge of fine-tuning approaches (LoRA, QLoRA, SFT)
experience with data and experiment versioning systems (DVC, MLflow, W&B)
understanding of working principles with code data and technical texts
experience in deploying and monitoring ML models in production.

Conditions

largest DS&AI community — over 600 DS specialists of the bank
digest of the latest developments in DS&AI and reports from the world's largest conferences
opportunity to choose a convenient work format: hybrid or office
comfortable modern office: Kutuzovskaya metro station, Kutuzovsky Prospect, 32
annual salary review, annual bonus
corporate gym and recreation areas
more than 400 educational programs from SberUniversity for professional and career development
extended VHI, preferential insurance for family and corporate pension program
mortgage more profitable up to 7% for each employee
free SberPrime+ subscription, discounts on products of partner companies
referral bonus for recommending friends to the Sber team.

Contacts

Description

Responsibilities

Requirements

Will be a plus:

Conditions

Similar vacancies

ML Engineer (GigaChat Data)

DS/LLM Engineer (Center for Practical AI)

ML Engineer LLM GigaChat

ML Engineer

ML Engineer (Infopanel)

MLE (Middle/Senior)

Senior DS/LLM Engineer (Center for Practical AI)

Senior ML Engineer (LLM / Agents, Autonomous Processes)

Middle/Senior Data Scientist LLM (B2C team)

Data Scientist (LLM / AI Agents)

Team Lead ML TTS GigaChat Data

Senior LLM Researcher (Center for Applied Artificial Intelligence)

ML Engineer (GigaChat Data)

Key Skills

Details

Average salary for this role