Description
We are a team of ML engineers specializing in data infrastructure and model validation. Our team creates critical tools for evaluating model quality, develops data processing pipelines, and focuses on generating high-quality technical and code data for model training.
Our main areas:
• Validation & Metrics — validation and measurement tools for models
• Data Engineering — data cleaning and synthesis pipelines
• Code & Technical Data — generation of code/technical data
• MLOps — hosting and using open source models
• Model Training — experiments with LoRA and SFT.
If you are interested in creating infrastructure for LLM development and working with cutting-edge data technologies — join us.
Responsibilities
- development of tools for validating and measuring model quality and data quality control systems
- creation of metrics for assessing LLM performance and accuracy
- automation of testing and benchmarking, filtering and preprocessing processes
- building and optimizing data cleaning and synthesis pipelines
- generation of high-quality code and technical datasets
- creation of synthetic data for training models on technical tasks
- supporting infrastructure for hosting open source models
- integration and use of open source models in product solutions
- training LoRA adapters for experimental tasks
- conducting SFT training as part of data research
- analysis of experiment results and interpretation of approaches.
Requirements
- excellent knowledge of Python and experience with ML libraries (LangChain/LangGraph, PyTorch, llm-foundry, verl)
- experience working with LLMs (both open source: Llama, Mistral, Qwen, and proprietary: GPT, Claude)
- understanding of the principles of working with data for model training: collection, cleaning, validation
- skills in building ML pipelines and process automation
- understanding of processes, knowledge of approaches to validating and testing machine learning models
- understanding of MLOps basics and working with containerization (Docker).
Will be a plus:
- experience with agent and RAG frameworks
- knowledge of fine-tuning approaches (LoRA, QLoRA, SFT)
- experience with data and experiment versioning systems (DVC, MLflow, W&B)
- understanding of working principles with code data and technical texts
- experience in deploying and monitoring ML models in production.
Conditions
- largest DS&AI community — over 600 DS specialists of the bank
- digest of the latest developments in DS&AI and reports from the world's largest conferences
- opportunity to choose a convenient work format: hybrid or office
- comfortable modern office: Kutuzovskaya metro station, Kutuzovsky Prospect, 32
- annual salary review, annual bonus
- corporate gym and recreation areas
- more than 400 educational programs from SberUniversity for professional and career development
- extended VHI, preferential insurance for family and corporate pension program
- mortgage more profitable up to 7% for each employee
- free SberPrime+ subscription, discounts on products of partner companies
- referral bonus for recommending friends to the Sber team.