Reach out directly about this role
Alice is a complex, high-load service based on large language models. Every year, Alice becomes smarter and helps users solve increasingly complex tasks. Currently, we are using large language models (LLMs) for this more and more often.
Each model is good at a specific task, but training a single universal model that will perfectly solve all tasks is very difficult. Therefore, we teach models to work with each other, turning them into agents. Each agent specializes in something of its own: one searches for information on the internet, another controls a browser, a third works with files or applications.
Alice consists of many components and runs on a large family of platforms. We want to create a new unified homogeneous runtime that will meet modern requirements and be able to support both long agent tasks and requests requiring an instant response.
Designing Alice's architecture You will design Alice's architecture and develop a new runtime so that Alice can work uniformly with both long tasks lasting tens of minutes and fast queries requiring hundreds of milliseconds without unnecessary overhead. Alice must remain reliable, efficient, and scalable, handling tens of thousands of RPS. It works with diverse types of input data: files, images, voice, text. The runtime will allow improving the system as a whole and individual components (models, tools, agents), and trajectories will be written in unified terms. The architecture must be transparent and understandable.
More about backend at Yandex — in the channel Yandex for Backend
3-5 years
Experience
Full-time
Employment
Onsite
Work Format
Backend
Specialization
IT & Tech
Industry
Corporation
Company Type
By city
IT & Tech
Industry
Corporation
Company Type