Reach out directly about this role
Senior ML Developer for the YandexGPT Pretraining Quality Team
Our team is responsible for YandexGPT pretraining — the first and most resource-intensive stage of LLM training. We handle data selection, experiment design, training procedure selection, and the actual training of large LLM models. Our goal is to ensure these models remain the best for Russian users, and in the long term, to win the AI race.
YandexGPT LLM models serve as the foundation for a number of Yandex services — such as Alice and Neuro in Search — and are used in many others, for example in Browser, Market, Advertising, and Translator. The performance quality of these products directly depends on the base models we provide. Furthermore, our models are available to users via API through Yandex Cloud.
Improving Capabilities An LLM model is a universal tool that can be applied to a wide variety of tasks. One way to improve our model is to select a specific focus area or skill for the neural network and concentrate on it. For example, such skills are: working with code, solving math problems, extracting information from text, logic, ability to rephrase text, knowledge of facts.
You will need to find a way to improve the selected skill: this could involve working with data, such as collecting a suitable dataset or generating synthetic data, as well as changes to the model training process.
Researching the Training Process The quality of the final model also heavily depends on the chosen training scheme and hyperparameters. This can be challenging, especially for large models where only one or two attempts might be possible. You will need to research methods that help make such choices correctly, based on Scaling Laws, experiments with smaller-scale models, and other available information.
Furthermore, we are currently investing in mechanistic interpretability — methods that allow analyzing why an LLM model gives a particular answer and which training data influenced it.
Multimodality At Yandex, we are developing not only text-based LLMs but also models that work with audio, images, and video. We face a major task of integrating these results into a multimodal pretraining: we will need to assemble a dataset and select a training scheme so that the general pretraining's quality matches or exceeds that of specialized models.
Exploring New Directions LLM training is a rapidly evolving field, with new research and releases from competitors constantly emerging. It's important to filter out from this stream those results that are highly likely to help us achieve our goals. You will regularly need to find such research, share it with colleagues, and implement it here. It's also clear that for technological leadership, merely reproducing existing results is insufficient. As a senior researcher, we expect you to propose new ideas and approaches to data collection and LLM training.
More about ML at Yandex — on the Yandex for ML channel
3-5 years
Experience
Full-time
Employment
Hybrid, Remote, Onsite
Work Format
Senior
Grade
Data Science & ML
Specialization
AI
Industry
Corporation
Company Type
By job title
Data Science & ML
Specialization
AI
Industry
Corporation
Company Type