Description

We are developing a GenAI knowledge management system for Sber's B2C products and processes, covering everything from content creation and distribution across channels (Agents with RAG, employee workspaces) to quality assessment and task generation for its improvement. Our knowledge base serves over 15 million clients per month across various channels. We are looking for an NLP Data Scientist to work on and lead AI initiatives within the team, focusing on the direct development and enhancement of LLM solutions with an emphasis on interaction with the LLM GigaChat. As part of the product cluster, you will be responsible for the full cycle of creating AI/ML solutions—from idea generation to production implementation.

Responsibilities

Development and implementation of LLM applications for solving knowledge management and knowledge transfer tasks to Agents (Classification, Clustering, RAG/Agentic RAG/GraphRAG, Summarization, Text Ranking, Text Matching)
Development of approaches and processes for evaluating the quality of LLMs and knowledge editor assistants based on them, including through the creation of annotation projects
Creation and management of ML pipelines
Optimization of model performance for production environments on CPU/GPU
Interaction with business stakeholders, system analysts, developers, DE, and DA
Organizing validation and generating hypotheses for solving technical and business problems.

Requirements

Experience in development using Python, numpy, sklearn, pandas + text data processing libraries
Experience working with Pytorch for building DL text models
Experience with LangChain/LangGraph libraries
Hands-on experience working with LLMs via API
Experience with RAG systems and a deep understanding of their mechanics
Excellent theoretical knowledge of classical and neural network NLP, including LLMs
Experience in fine-tuning NLP models
Practical experience, experiments, and deployment of NLP solutions in production
Experience in prompt-engineering
Knowledge of SQL
Linux, Git.

Will be a plus:

Experience with Hadoop tools (HDFS, Hive), Spark
Experience with vector databases (OpenSearch, pgvector PostgreSQL)
Experience in setting up and conducting A/B tests
Experience with distributed training, deep knowledge of GPU architecture.

Conditions

Comfortable modern office near Kutuzovskaya metro station
Work format - hybrid is possible after the probationary period
Annual salary review, annual bonus
Corporate gym and relaxation areas
Over 400 educational programs from SberUniversity for professional and career development
Extended DMS, preferential insurance for family, and a corporate pension program
Flexible mortgage discount equal to 1/3 of the Central Bank's key rate
Free SberPrime+ subscription, discounts on products from partner companies
Referral bonus for recommending friends to the Sber team.

Contacts

Description

Responsibilities

Requirements

Conditions

Similar vacancies

Senior Data Scientist

Senior NLP Data Scientist (AI Agents)

Senior Data Scientist (Modeling and Data Research Management)

Data Scientist (R&D NLP)

Middle/Senior Data Scientist LLM (B2C team)

Senior Data Scientist

Senior Data Scientist NLP | RND TeamLead in LegaTech

Middle Data Scientist NLP (GenAI Solutions Validation)

Senior DS/LLM Engineer (Center for Practical AI)

Team Lead Data Science NLP

NLP, LLM Middle+ Data Scientist (AI Solutions Development, Strategy & Development Block)

DS/LLM Engineer (Center for Practical AI)

Senior NLP Data Scientist (Knowledge Management team)

Key Skills

Details

Average salary for this role