Description

We are the Kandinsky team, creating and developing core technologies for generative image and video models. Our models are capable of generating photorealistic scenes, stylized illustrations, and video clips from text descriptions, editing visual content, and working in multimodal scenarios.

Currently, we are strengthening our model quality assessment and benchmarking direction. We are looking for a Lead for the AI Quality & Benchmarking team who will take ownership of the quality of generative models and influence decisions regarding releases, development, and comparison with the best global counterparts.

Responsibilities

Full ownership of the generative model quality assessment track (Image / Video / Multimodal).
Designing, developing, and maintaining quality benchmarks: SBS, point-wise, pairwise, human-in-the-loop.
Forming and developing a system of quality metrics: prompt adherence, knowledge, visual quality, stability, safety.
Supporting and developing internal model leaderboards (quality, latency, inference cost).
Analyzing quality regressions between model versions, identifying root causes of degradations.
Collecting, systematizing, and prioritizing quality feedback from users and ML / Product teams.
Formulating and defending go / no-go decisions for model releases based on data and metrics.
Regular analysis and preparation of reports on competitor models (open-source and proprietary).

Requirements

Higher education in mathematics, applied mathematics, computer science, data science, ML, or related fields.
4+ years of experience in Data Science / ML / Applied Research, with significant focus on model quality assessment.
Deep understanding of the principles of generative models (V-LLM / diffusion / multimodal) and their limitations.
Practical experience in designing and maintaining model quality assessment systems (offline and online evaluation).
Ability to independently formulate quality metrics and make architectural decisions regarding evaluation pipelines.
Strong analytical background: working with statistics, interpreting metrics, assessing significance, analyzing trade-offs (quality ↔ stability ↔ cost).
Proficient knowledge of Python (Pandas, NumPy, SciPy), experience with SQL.
Experience with human evaluation, subjective metrics, and data labeling processes.
Ability to clearly formulate conclusions and communicate decisions to ML, Product, and management.

Conditions

Largest DS&AI community — over 600 DS specialists from the bank.
Digest of the latest developments in DS&AI and reports from the world's largest conferences.
Opportunity to be a co-author of research papers and articles for international conferences.
Opportunity to choose a convenient work format: hybrid or office.
Annual salary review, annual bonus.
Corporate gym and relaxation areas.
Over 400 educational programs from SberUniversity for professional and career development.
Extended voluntary health insurance (DMS), preferential insurance for family, and corporate pension program.
Mortgage rates more favorable by up to 7% for every employee.
Free SberPrime+ subscription, discounts on products from partner companies.
Referral bonus for recommending friends to the Sber team.

Contacts

Description

Responsibilities

Requirements

Conditions

Similar vacancies

Head of GigaChat Quality Metrics Analytics (LLM Evaluation)

Data Research Lead/Lead Data Scientist (SBOL)

Data Researcher

Head of RnD Data Science Team

Senior/Lead Data Scientist GenAI (SberAI team)

Senior Data Scientist

Data Scientist / ML Researcher (Multi-agent AI Assistant)

ML Engineer (Lead Expert)

Senior Data Scientist / AI Researcher

Middle Data Scientist LLM (AI Phygital team)

Team Lead ML TTS GigaChat Data

Data Scientist in the Compliance Directorate

Lead AI Quality & Benchmarking

Key Skills

Details