Description
The IT Infrastructure Management team is looking for an employee to strengthen our team in the area of developing availability monitoring services for IT services, information systems, and company infrastructure.
You will be involved in the operation and development of the monitoring system based on Zabbix, Prometheus, Grafana, Thanos, ensuring the onboarding of new services for monitoring, and thereby improving the transparency and manageability of incidents.
Responsibilities
- Administration, maintenance, and development of monitoring systems based on Zabbix, Grafana, Prometheus;
- Creation and support of templates, triggers, and rules for automatic host and service discovery;
- Configuration of alerting and notification channels for timely incident response;
- Development and optimization of dashboards in Grafana for visualizing metrics and infrastructure status;
- Integration of monitoring with external systems (CMDB, ServiceDesk, Confluence) to enrich event context;
- Analysis of metric and event correlation to assist teams in diagnosing and resolving problems.
Requirements
- Experience with Grafana and Prometheus;
- Proficient Linux administration;
- Deep knowledge of Zabbix: templates, discovery, JSONPath, regular expressions, macros, scripts, notifications;
- Basic understanding of Prometheus and PromQL;
- Experience with Docker.
Will be a plus
- Experience applying machine learning (ML) in monitoring tasks: load forecasting, anomaly detection, incident classification;
- Work with tracing: experience using Grafana Tempo, OpenTelemetry, Jaeger, or similar tools;
- Experience with Thanos or other solutions for long-term storage and scaling of Prometheus;
- Understanding of business monitoring principles;
- Experience writing scripts and automating tasks in Python or Bash;
- Experience with the ELK stack or similar;
- Ability to write basic SQL queries;
Conditions
- Hybrid work format (modern office in Moscow near Prospekt Mira metro station);
- Preferential mortgage lending conditions;
- Free subscription to SberPrime+;
- Discounts on products from partner companies: Okko, SberMarket, Mega Market, Samokat, Eapteka, and others;
- Voluntary health insurance (VHI) from the first day and insurance discounts for family members;
- Corporate pension program;
- Children's recreation and gifts funded by the Company;
- Company-funded training, online courses, unlimited access to the library, and training at the Corporate University, trainings, meetups, and the opportunity to obtain new qualifications.