Description
We are developing a GenAI knowledge management system for Sber's B2C products and processes, covering everything from content creation and distribution across channels (Agents with RAG, employee workspaces) to quality assessment and task generation for its improvement. Our knowledge base serves over 15 million clients per month across various channels. We are looking for an NLP Data Scientist to work on and lead AI initiatives within the team, focusing on the direct development and enhancement of LLM solutions with an emphasis on interaction with the LLM GigaChat. As part of the product cluster, you will be responsible for the full cycle of creating AI/ML solutions—from idea generation to production implementation.
Responsibilities
- Development and implementation of LLM applications for solving knowledge management and knowledge transfer tasks to Agents (Classification, Clustering, RAG/Agentic RAG/GraphRAG, Summarization, Text Ranking, Text Matching)
- Development of approaches and processes for evaluating the quality of LLMs and knowledge editor assistants based on them, including through the creation of annotation projects
- Creation and management of ML pipelines
- Optimization of model performance for production environments on CPU/GPU
- Interaction with business stakeholders, system analysts, developers, DE, and DA
- Organizing validation and generating hypotheses for solving technical and business problems.
Requirements
- Experience in development using Python, numpy, sklearn, pandas + text data processing libraries
- Experience working with Pytorch for building DL text models
- Experience with LangChain/LangGraph libraries
- Hands-on experience working with LLMs via API
- Experience with RAG systems and a deep understanding of their mechanics
- Excellent theoretical knowledge of classical and neural network NLP, including LLMs
- Experience in fine-tuning NLP models
- Practical experience, experiments, and deployment of NLP solutions in production
- Experience in prompt-engineering
- Knowledge of SQL
- Linux, Git.
Will be a plus:
- Experience with Hadoop tools (HDFS, Hive), Spark
- Experience with vector databases (OpenSearch, pgvector PostgreSQL)
- Experience in setting up and conducting A/B tests
- Experience with distributed training, deep knowledge of GPU architecture.
Conditions
- Comfortable modern office near Kutuzovskaya metro station
- Work format - hybrid is possible after the probationary period
- Annual salary review, annual bonus
- Corporate gym and relaxation areas
- Over 400 educational programs from SberUniversity for professional and career development
- Extended DMS, preferential insurance for family, and a corporate pension program
- Flexible mortgage discount equal to 1/3 of the Central Bank's key rate
- Free SberPrime+ subscription, discounts on products from partner companies
- Referral bonus for recommending friends to the Sber team.