Reach out directly about this role
By city
3-5 years
Experience
Full-time
Employment
Hybrid, Onsite
Work Format
Senior
Grade
Data Science & ML
Specialization
Ecommerce
Industry
Corporation
Company Type
Our team is engaged in collecting and preparing data for the e-commerce segment of Search. We want to improve search scenarios so that users can more easily find information about products, stores, and make purchases online.
We are building the best product database in Runet: by indexing websites through robot crawls and configuring B2B integrations, we populate the content system built on YTsaurus technologies, process hundreds of thousands of requests per second, and store tens of billions of records and terabytes of data.
During data collection, enrichment, and keeping it up-to-date, we encounter tasks of all kinds: research, infrastructure — for example, production code in C++20, working with data and creating MVPs in Python, as well as ML tasks: preparing and implementing both YandexGPT and smaller models like BERT, CatBoost, DSSM.
Examples of tasks we solve:
We are looking for an experienced ML developer who will help make product search better and more convenient for users.
Using "product — catalog" relationships to populate the product database One way to efficiently index the constantly changing set of product pages on the internet is through natural batches: catalogs and listings. And for this approach to work, in addition to high-quality extraction of product information from catalog pages, one must be able to crawl pages in the necessary order and maintain keys for linking entities in the database (many-to-many) at the database level to avoid breaking data consistency.
Determining product availability for order If we want to make the product search results relevant for the user, then we must understand which products the user could actually order. For this, we want to promptly find unavailable pages and products that are no longer in stock, as well as account for user regionality in runtime.