Reach out directly about this role
By job title
3-5 years
Experience
Full-time
Employment
Hybrid, Remote, Onsite
Work Format
Senior
Grade
Data Science & ML
Specialization
AI
Industry
Corporation
Company Type
DL Developer for the YandexGPT Architecture Research Team
Our team is responsible for researching architecture and training models for YandexGPT. We are engaged in LLM pre-training, conduct experiments with state-of-the-art approaches, optimize distributed training on a large GPU cluster, and ultimately, our foundational models are integrated into Yandex's key products. We are looking for an experienced DL engineer to join the core technology team.
Reproducing and improving global advancements in LLM training You will implement and research LLM architectures and all components of training. Your work may include implementing new optimizers, setting up experiments with Mixture of Experts, improving attention mechanisms, and much more. You will conduct experiments to achieve maximum quality and efficient model inference.
Enhancing the efficiency of distributed GPU training You will be responsible for accelerating model training on the cluster, researching and implementing the most optimal parallelism strategies, profiling CUDA and CPU code, and identifying bottlenecks.
Analyzing relevant publications You will need to perform in-depth analysis of relevant contemporary publications on the topic, identify the most promising and useful approaches, and then reproduce and improve them.
More about ML at Yandex — in the Yandex for ML channel