Description
We are the GigaSearch team, creating a search service that responds to user queries in natural language. Our mission: to give GigaChat access to up-to-date information so that users receive accurate answers to any questions—including the latest news and events.
A Data Engineer in our team is a person who builds the "quality platform": collects and consolidates data from various sources, creates data marts and datasets for evaluation, automates pipelines and monitoring, including in real-time, so that degradations are found faster than users see them.
If you are interested in working at the intersection of data engineering, product analytics, and search—and you want not just to "move data around," but to build the infrastructure upon which the quality culture of an AI product stands—come join us.
What we are working on:
- A unified data model for evaluating the quality of the retriever and the final LLM answer on production traffic.
- Medallion architecture in ClickHouse: bronze (raw) → silver (cleaned/normalized) → gold (marts for metrics and monitoring).
- Automated pipelines for quality evaluation and re-evaluation (including LLM-as-a-judge) and their reproducibility.
- Data marts and datasets for offline/online evaluation, A/B experiments, and degradation investigations.
- Monitoring data quality in the stream: freshness, completeness, delays, anomalies, regressions.
Responsibilities
- Design and maintain ELT/ETL pipelines in Airflow (reliability, idempotency, retries, backfill, SLA).
- Organize ingestion between databases with clear data contracts.
- Develop medallion layers in ClickHouse: raw tables, normalization/enrichment, gold marts for search quality metrics.
- Perform analytics "almost real-time" in ClickHouse: incremental calculations, materialized views, pre-aggregations, optimization of end-to-end delays.
- Develop data marts and aggregates: partitioning, sort keys, TTL management, control of query cost and response time.
Requirements
- Experience in analytics engineering for 3+ years (middle+ / senior level).
- Very strong SQL and practical experience in building data marts (ClickHouse and/or PostgreSQL; window functions, complex aggregations, optimization).
- Proficient Python for ETL/ELT tasks (parsing, validation, integrations, pipeline utilities).
- Experience with Airflow in production (DAG design, dependencies, operations, backfill).
- Good understanding of data architecture and dataset lifecycle (raw → normalization → marts), ability to create maintainable solutions.
Will be a plus
- dbt or a similar approach to managing SQL models (tests, documentation, dependencies).
- Experience with search/click logs (search result impressions, clicks, sessions, dwell time) and event stitching.
- Experience building monitoring/dashboards (Grafana / Superset / DataLens).
- Experience with Kafka/queues, Kubernetes, Terraform/Ansible, CI/CD for a data project.
- Familiarity with OpenSearch/Elastic as a component of a search system.
Conditions
- Comfortable modern office near Kutuzovskaya metro station.
- Hybrid work format (during probation period, office attendance is required, afterwards 2-3 days a week—work from home).
- Annual salary review, annual bonus.
- Corporate gym and relaxation areas.
- Learning system for professional and career development.
- Extended voluntary health insurance (VHI) policy from the first day of work and family insurance.
- Employee mortgage program.
- Free SberPrime+ subscription, discounts on partner company products.
- Referral bonus for recommending friends to the Sber team.