About the role

We are looking for a technical leader to manage a team of six L1 engineers, a multi-server architecture with a large fleet of equipment and interconnected services, as well as the full cycle: from deployment to monitoring and incident resolution.

You will personally handle complex, rare, and non-standard incidents. But the strategic mission is broader – to build a system where typical problems are resolved at the L1 level according to the runbook, without your intervention. Through documentation, mentoring, and continuous process improvement.

You will maintain a complete picture of the architecture: understand the dependencies between services, participate in releases, and assess the risks of changes.

Tasks

Handling complex incidents that go beyond the runbook (Manual Cases)
Writing, updating, and reviewing runbooks for the L1 team
Mentoring: assisting on-duty engineers in incident resolution and skill development
Participation in Change & Release processes: risk assessment, deployment support
Maintaining and updating the Service List: describing services, dependencies, criticality
Preparing Root Cause Analysis for significant incidents
Interaction with Development and Product teams during escalations

Mandatory requirements

Linux — deep knowledge: network stack, performance diagnostics, system tuning
Docker / Docker Compose — confident configuration, debugging, optimization
NGINX, HAProxy — configuration, load balancing, SSL/TLS, upstream management
MySQL — replication, cluster configurations, backup/restore, query and schema optimization
Redis — architecture, diagnostics, failover and persistence configuration
RabbitMQ — understanding of queue models, diagnostics, recovery from failures
Memcached — configuration, diagnostics, load optimization
ClickHouse — basic operation, diagnostics, reading query profiles
PHP — understanding at the operational level: interpreter, configuration (php-fpm, php.ini), logs, basic debugging
Monitoring and alerting — configuration of Nagios (NRPE/NCPA), Loki, Sentry; writing checks and alerting rules
Git / GitLab / SVN — understanding of VCS, working with pipelines, participation in the release process
RAID — understanding of Software RAID and Hardware RAID, diagnostics of array degradation
LLM assistants (Claude, Cursor, etc.) — confident use for analyzing complex problems, writing runbooks, automating documentation
Experience in writing technical documentation and runbooks
English language for reading technical documentation and alerts
5+ years of experience as a sysadmin, DevOps

Contacts

Similar vacancies

OnDuty Engineer

Site Reliability Engineer (SRE)

DevOps Engineer

Lead Developer of Internal Portal Products

DevOps Engineer (middle+/senior)

Middle/Senior SysOps/System Administrator

DevOps Engineer

Support Engineer / Devops

Site Reliability Engineer (SRE)

DevOps Engineer

System Engineer – Senior

L2 Engineer (On-Call)

Lead Sysadmin / NOC Tech Lead

Key Skills

Details

Average salary for this role