Reach out directly about this role
Developer of Multimodal VLM (visual language models)
Multimodal models are one of the trends in the field of deep learning. We, the computer vision team, build visual-language multimodal models (visual language models, VLM). They adapt large language models to work not only with text but also with images.
We are looking for developers who will work on next-generation neural networks and bring their solutions to the level of a finished product.
Train large language models to work with visual information (images and videos) You will work at the intersection of two areas: computer vision and natural language processing. Non-standard technical and architectural solutions are used to create VLM.
Create large data pipelines that process all internet data Training VLM requires a huge amount of data. We are building full-scale data pipelines for collecting, processing, and filtering multimodal data.
Optimize large-scale model training and accelerate their inference There are many nuances in the process of training VLM. To make it efficient, we have to profile bottlenecks a lot. And after training, we need to think about how to make fast inference of such models.
Adapt models to product requirements Our goal is to integrate VLM into every Yandex service. To do this, we have to consider the specifics of each task, and most importantly — adapt the model (both architecturally and functionally) to specific requirements.
More about ML at Yandex — on the channel Yandex for ML
1-5 years
Experience
Full-time
Employment
Hybrid, Onsite
Work Format
Data Science & ML
Specialization
AI
Industry
Corporation
Company Type
By job title
AI
Industry
Corporation
Company Type