Description
We are looking for an experienced and motivated team lead to head a team of administrators (Infrastructure Engineers) responsible for the reliability, performance, and development of Sber's critically important HR platform services. You will be not only a technical leader but also a mentor for the team, responsible for process quality and the strategic development of the infrastructure.
Responsibilities
- Forming, developing, and motivating a team of administrators (Infrastructure Engineers).
- Setting goals (OKRs), planning and distributing tasks, conducting regular 1:1 meetings and performance reviews.
- Continuously improving support, monitoring, and automation processes.
- Troubleshooting complex problems in distributed high-load systems.
- Analyzing incidents and developing recommendations to improve the fault tolerance, scalability, and performance of the HR platform.
- Developing proactive and reactive monitoring, creating effective alerts based on SLOs.
- Participating in designing the architecture of new services considering reliability and operational requirements.
- Participating in developing and implementing the reliability and performance strategy for key services.
- Close collaboration with development, testing, and product teams throughout the service lifecycle.
- Interaction with support and maintenance teams of the HR platform (SRE, DBA, DevOps).
- Interaction with the bank's infrastructure support teams.
Requirements
- Minimum 3 years of experience managing a Dev/DevOps/SRE/Infrastructure team (task setting, motivation, development, hiring).
- Deep practical experience (minimum 5 years) as an Infrastructure/DevOps Engineer or SRE.
- Deep understanding and practical application of SRE (Site Reliability Engineering) philosophy and practices.
- Expert troubleshooting skills in complex distributed systems.
- Experience building, scaling, and supporting high-load, fault-tolerant systems.
- Proficient command of key automation tools: Ansible, Terraform.
- Deep knowledge in the field of containerization and orchestration: Docker, Kubernetes (Openshift).
- Solid knowledge of one automation language: Python, Go, Ruby, or Bash.
- Experience with monitoring and visualization systems: Prometheus, Grafana, Zabbix, Dynatrace.
Tech Stack:
- Linux: RHEL
- Docker, Kubernetes, Openshift (CRI, CNI, CSI)
- Nginx, envoy, openresty
- Kafka
- PostgreSQL, Redis, Clickhouse
- Vault, Consul SD
- ELK, fluentd, fluentbit
- Prometheus, Grafana, Zabbix, Dynatrace
- Jenkins, Gitlab (Drone, Gitea, Bitbucket)
- Python, ruby, bash, groovy, Go
- Ansible, terraform
Conditions
- A good office (AgileHome) near Kutuzovskaya metro station with all amenities (cafeterias + numerous cafes + kitchens with refrigerators, coffee machines; a free gym; free underground parking; recreation areas - table tennis, several PlayStation consoles, foosball, billiards)
- A competitive salary (base pay + bonuses)
- An opportunity to work with a modern technology stack
- Social package (VHI)
- A huge catalog of educational programs, opportunities for training and certification at the company's expense
- A preferential lending program at SBER
- Discount programs from numerous partner companies