The team is engaged in collecting high-quality data, which is fundamentally important for training cutting-edge AI models.

Define the technical strategy and target system architecture
Design systems ready to scale from millions to billions of entities: horizontal scaling, sharding, geo-distribution
Make key technological decisions: technology stack selection, storage strategies, architectural patterns (event-driven, CQRS, etc.)
Design end-to-end pipelines: URL discovery → fetch → parse → deduplicate → store → deliver
Write code for the most complex and critical components: crawl scheduler, URL frontier, deduplication, conduct performance engineering: profiling, load testing, elimination of bottlenecks
Optimize proxy infrastructure and global traffic routing, research and implement new technologies: HTTP/3, QUIC, eBPF, io_uring to maximize throughput, develop infrastructure cost optimization strategies (FinOps)
Lead a team of 3–6 engineers: planning, decomposition, prioritization, establish and maintain technical standards: Code Review, coding guidelines, architectural ADRs, mentor developers, conduct 1-1s, assist with professional development, interact with data consumers and related teams for roadmap planning.

Commercial Python development experience of 7+ years, with at least 2 years in a Tech Lead/Team Lead role
Expertise in designing distributed high-load systems (millions of RPS, petabytes of data)
Deep understanding of the network stack: TCP/UDP, HTTP/1.1/2/3, TLS, DNS, QUIC
Practical experience in building large-scale crawling systems or similar data acquisition systems
Expertise in asynchronous and parallel programming
Experience in designing microservice and event-driven architecture
Experience with Kafka, Redis, PostgreSQL, ClickHouse (or equivalents)
Proficient in Kubernetes, Docker, Terraform, CI/CD
Experience with cloud platforms
Deep troubleshooting skills: tcpdump, strace, perf, eBPF tools
Team management experience: planning, Code Review, mentoring, 1-1s, hiring

Will be a plus:

Knowledge of Go, Rust, or C++ for writing high-performance components
Experience in companies with large-scale crawling (search engines, data aggregators)
Experience with io_uring, eBPF, zero-copy networking
Experience in building geo-distributed systems
Experience applying ML to optimize crawl strategies (prioritization, filtering, classification), public speaking, articles.

Comfortable modern office near Kutuzovskaya metro station
Hybrid work format
Annual salary review, annual bonus
Corporate gym and relaxation areas
Learning system for professional and career development
Extended voluntary health insurance from the first day of work and insurance for family
Employee mortgage program
Free SberPrime+ subscription, discounts on partner company products
Referral bonus for recommending friends to the Sber team.

Similar vacancies

Python Technical Lead