Reach out directly about this role
By city
5 years
Experience
Full-time
Employment
Onsite
Work Format
Data Science & ML
Specialization
IT & Tech
Industry
Corporation
Company Type
ML Developer for the Neuro Source Search Team in Alice
Our source search team uses modern LLMs to find the most relevant documents — from the web, images, videos, and other sources. Based on the found documents, Alice builds high-quality and informative responses for millions of users. This is a key part of the Neuro technology in Alice: the intelligence and usefulness of the answers directly depend on how we find and process information.
Users come to us with long and complex queries. Using LLMs, we break down the task into subtasks and find suitable documents on the internet for each one. This allows us to find all the information for the original query and respond fully and helpfully. We also solve deep search problems when the desired result cannot be obtained in a single iteration. The better we find and structure documents for language models, the more accurate and meaningful the answers for users will be.
We use LLMs that decide which sources to search for data (web, images, etc.) and which queries to send to them. We also apply LLMs to analyze the found data: to understand what has already been obtained, what is missing, and make decisions about continuing the search.
Working in our team, you will be able to advance your skills in several areas at once: analytics, programming, and machine learning. We focus not on abstract research but on practical tasks and launching solutions directly into the product — your developments will quickly become part of one of Yandex's key technologies. Here you can implement your ideas, see them work at the scale of millions of users, and directly influence the development of modern digital assistants.
Designing and launching architecture You will design new solutions: define formats for interacting with search APIs, prepare data for LLMs, tune hyperparameters. You will also be able to implement your solutions: from a prototype to an industrial, fault-tolerant service for millions of users.
LLM Prompt Engineering You will develop and test prompts for YandexGPT and other LLMs to optimize query generation and the search for relevant information.
Developing metrics and quality control You will need to create metrics for objectively assessing the quality of found information and the contribution of each source, gather suitable queries (both real ones from logs and synthetic samples), and select and calculate metrics using various approaches: from regular expressions to crowdsourcing via prompts.