Description
We are the Kandinsky team, creating and developing core technologies for generative image and video models. Our models are capable of generating photorealistic scenes, stylized illustrations, and video clips from text descriptions, editing visual content, and working in multimodal scenarios.
Currently, we are strengthening our model quality assessment and benchmarking direction. We are looking for a Lead for the AI Quality & Benchmarking team who will take ownership of the quality of generative models and influence decisions regarding releases, development, and comparison with the best global counterparts.
Responsibilities
- Full ownership of the generative model quality assessment track (Image / Video / Multimodal).
- Designing, developing, and maintaining quality benchmarks: SBS, point-wise, pairwise, human-in-the-loop.
- Forming and developing a system of quality metrics: prompt adherence, knowledge, visual quality, stability, safety.
- Supporting and developing internal model leaderboards (quality, latency, inference cost).
- Analyzing quality regressions between model versions, identifying root causes of degradations.
- Collecting, systematizing, and prioritizing quality feedback from users and ML / Product teams.
- Formulating and defending go / no-go decisions for model releases based on data and metrics.
- Regular analysis and preparation of reports on competitor models (open-source and proprietary).
Requirements
- Higher education in mathematics, applied mathematics, computer science, data science, ML, or related fields.
- 4+ years of experience in Data Science / ML / Applied Research, with significant focus on model quality assessment.
- Deep understanding of the principles of generative models (V-LLM / diffusion / multimodal) and their limitations.
- Practical experience in designing and maintaining model quality assessment systems (offline and online evaluation).
- Ability to independently formulate quality metrics and make architectural decisions regarding evaluation pipelines.
- Strong analytical background: working with statistics, interpreting metrics, assessing significance, analyzing trade-offs (quality ↔ stability ↔ cost).
- Proficient knowledge of Python (Pandas, NumPy, SciPy), experience with SQL.
- Experience with human evaluation, subjective metrics, and data labeling processes.
- Ability to clearly formulate conclusions and communicate decisions to ML, Product, and management.
Conditions
- Largest DS&AI community — over 600 DS specialists from the bank.
- Digest of the latest developments in DS&AI and reports from the world's largest conferences.
- Opportunity to be a co-author of research papers and articles for international conferences.
- Opportunity to choose a convenient work format: hybrid or office.
- Annual salary review, annual bonus.
- Corporate gym and relaxation areas.
- Over 400 educational programs from SberUniversity for professional and career development.
- Extended voluntary health insurance (DMS), preferential insurance for family, and corporate pension program.
- Mortgage rates more favorable by up to 7% for every employee.
- Free SberPrime+ subscription, discounts on products from partner companies.
- Referral bonus for recommending friends to the Sber team.