Description
Center for Applied AI is a division of Sberbank that deals with complex AI projects and solves non-trivial tasks for the bank and the Sber ecosystem.
Our team builds and fine-tunes a line of LLMs for banking scenarios and deploys models to production in "strict" environments: local devices / closed perimeters / strict latency SLAs.
The work involves the full cycle: data → fine-tuning → quality assessment → inference optimization → load testing → implementation.
The first stage of selection for this vacancy is a conversation with an AI recruiter. After applying, you will receive an email invitation to have a preliminary interview with Gigarecruiter in Telegram. The dialogue will take about 10 minutes. Its goal is to clarify missing details and expedite the consideration of your candidacy.
Gigarecruiter is just beginning its journey, so we ask for your understanding. Your experience and participation will help make it convenient and useful!
Responsibilities
- fine-tuning LLMs for specific, applied banking tasks (instruction-tuning, adapters/LoRA, SFT; alignment approaches if necessary)
- development and optimization of AI agents and complex RAG pipelines (LangChain / LlamaIndex / LangGraph and similar): routing, tools, memory, ranking, multi-sources
- building and improving inference pipelines to meet specified latency / throughput / cost-to-inference requirements (batching, KV-cache, speculative approaches, profiling)
- optimization of model execution for specific hardware and execution environment (architecture selection, bottleneck acceleration, Triton/CUDA-oriented improvements if necessary)
- conducting load testing and performance validation: test methodology, scenarios, metrics, reproducibility, conclusions and reports
- collection and preparation of data for fine-tuning: defining dataset requirements, collection and labeling strategy, generation of synthetic data where appropriate, data quality control
- development of quality metrics and evaluation system (golden set, automated evaluation pipeline + manual labeling, quality regression, version comparison, report maintenance)
- packaging the model into services / SDK, integration with internal APIs/knowledge bases, collaboration with platform engineers and product teams until launch to production.
Requirements
- NLP/LLM experience of at least 3 years
- confident hands-on experience with LLM/transformers
- confident knowledge of Python
- understanding of LLM architecture and principles of building inference pipelines, experience with vLLM / TensorRT-LLM / TGI / ONNX Runtime, quantization pipelines (AWQ/GPTQ/bitsandbytes, etc.)
- practical experience in Fine-Tuning LLMs for various tasks (QA, classification, entity extraction, summarization, translation, code-task is an advantage)
- experience building RAG pipelines and AI agents / multi-agent systems
- ability to design reproducible experiments and validate changes with numbers
Will be a plus:
- performance optimization experience (profiling, identifying bottlenecks on CPU/GPU, memory), understanding of trade-offs between quality vs speed vs cost, experience with low-latency / on-device / edge deployments
- industrial development experience (tests, code style, code reviews, CI/CD, devops processes)
- experience with development and deployment infrastructure (Linux, Docker, Kubernetes, monitoring)
- experience building reusable components (core LLM / frameworks / platform solutions) that are reused by different customers in different products
- practice of reading and implementing solutions from scientific papers (without strict fanaticism).
Conditions
- comfortable modern office in Moscow, near Kutuzovskaya metro station
- the ability to choose a convenient schedule – office/hybrid (with office attendance at least 2 days a week)
- annual salary review and annual bonus
- corporate gym and relaxation areas
- more than 400 educational programs from SberUniversity for professional and career development
- private health insurance (DMS), preferential insurance for family, and corporate pension program
- flexible mortgage discount equal to 1/3 of the Central Bank's key rate
- free SberPrime+ subscription, discounts on products from partner companies.