Description
The team is engaged in collecting high-quality data that is fundamentally important for training advanced AI models.
Responsibilities
- develop mechanisms for loading data from new sources
- optimize the storage of exported data and mechanisms for its verification and reloading
- implement new tools for efficient management of data exports
- develop infrastructure solutions to maximize the speed and efficiency of data exports
- implement intelligent export mechanisms to increase the speed and quality of exported data
- work extensively with various cloud solutions
Requirements
- more than 6 years of Python development
- good knowledge of asynchronous and multithreaded development
- knowledge of network protocols, differences between TCP/UDP/ICMP and others, understanding of HTTP/HTTPS, DNS, FTP, SFTP, S3 principles
- understanding of API concepts (REST, gRPC, GraphQL), working with proxies and request routing in global networks
- understanding of working with web crawlers and link scrapers
- experience with selenium or similar tools, with relational and non-relational databases
- ability to work with console utilities wget/curl/ping/telnet
- troubleshooting skills and familiarity with tools such as tcpdump, strace, netstat, and others
Conditions
- comfortable modern office near Kutuzovskaya metro station
- hybrid work format
- annual salary review, annual bonus
- corporate gym and recreation areas
- training system for professional and career development
- extended voluntary health insurance policy from the first day of work and family insurance
- employee mortgage program
- free SberPrime+ subscription, discounts on products from partner companies
- referral bonus for recommending friends to the Sber team.