Description
We are creating a search service to answer user queries in natural language. We are breaking down the barrier between the static knowledge of a language model and a constantly changing world. We provide GigaChat with access to up-to-date information so that users receive accurate answers to any questions, including questions about fresh news and events.
Responsibilities
- develop and configure mechanisms for automated data collection, ensure the correctness and completeness of collection, optimize processes so that everything works faster and without manual intervention
- develop data preprocessing pipelines and transform data into a format optimal for further storage, processing, and use in search tasks
- design and implement storage solutions that would allow for efficient solving of search tasks
- apply machine learning and artificial intelligence to improve performance results, maintain the correct operation of the system – monitoring, diagnostics and troubleshooting, fixing old bugs and creating new ones
Requirements
- advanced level of Python and SQL
- experience with distributed data processing engines (Spark, Trino), Airflow orchestrators
- ability to design DWH, Data Lake, Data Management Platform
- experience in building and developing high-load systems
- experience in developing and optimizing pipelines (batch, streaming) for processing large volumes of data (100TB - 1PB+)
Will be a plus:
- experience with Iceberg format tables
- experience with ElasticSearch/OpenSearch indexes
- experience with GPUs (model inference)
Conditions
- comfortable modern office in Moscow, Kutuzovskaya metro station
- opportunity to choose a convenient schedule – hybrid format or office
- annual salary review, annual bonus
- corporate gym and relaxation areas
- more than 400 educational programs from SberUniversity for professional and career development
- extended voluntary health insurance, preferential insurance for family members, and corporate pension program
- free SberPrime+ subscription, discounts on products from partner companies