Description

We are looking for an experienced SRE engineer to support and develop a distributed cloud infrastructure based on an OpenStack-like ecosystem in a custom Linux distribution (RPM-based).

You will be responsible for the operational reliability of the platform, automation, observability, release processes, and incident investigation in the production environment.

Responsibilities

operation and development of the production infrastructure of the cloud platform (control plane + compute/network/storage)
designing and maintaining SLO/SLI, participating in incident response, postmortem (RCA)
automation of operational tasks (deployment, updates, migrations, configuration audits)
development and maintenance of infrastructure tools (scripts, services, operators, utilities)
diagnosing complex issues in Linux/networks/storage/virtualization, reducing MTTR
supporting observability: metrics, logs, traces, alerts, dashboards.
working with CI/CD and release processes: testing, canary deployments, rollback, version control.

Requirements

excellent knowledge of Linux (at the level of operation and diagnostics): systemd, journalctl, cgroups, namespaces, network stack (iptables/nftables, routing, MTU, TCP/UDP), file systems
containerization: Docker and/or Podman, working with registry, networking, volumes.
virtualization: QEMU/KVM, understanding of interaction via libvirt (CLI/API), network bridge/overlay.
experience with CI/CD (Git, GitLab CI or similar), release automation.
experience with configuration management (Ansible or similar).
basic experience with build and package publishing systems for RPM (rpmbuild/mock/koji or similar).
experience using GigaChat, Kandinsky, and similar tools in products, skills in creating and using AI agents.

Will be a plus

practical experience operating OpenStack (or its components/analogs)
experience with Ceph (or other distributed storage systems).
skills in working with Prometheus/Grafana/Alertmanager (or a similar stack)
experience building centralized logging (Loki/ELK/OpenSearch).
understanding of service architectures: REST/RPC, message-bus approach (RabbitMQ/Kafka)
experience with hardening, basic security mindset (TLS, secrets, access policies)
experience supporting a custom Linux distribution and internal repositories.

Conditions

working with a large modular cloud infrastructure and real production tasks
opportunity to influence the operational architecture, release process, and platform reliability
technically challenging tasks at the intersection of Linux, virtualization, networks, and distributed systems.
annual bonus and yearly salary review
status of an accredited IT company with all the benefits
extended health insurance from day one and preferential family insurance
Sber Corporate University, internal educational platform, participation in IT conferences
preferential mortgage from Sber, SberPrime+ subscription, discounts from partners and services of the group of companies.

Contacts

Description

Responsibilities

Requirements

Conditions

Similar vacancies

Senior Customer Relations Manager

Acquiring Products Sales Manager

Technical Lead AS Mobile Application iOS/Android (Sberbank Insurance)

Bank Branch Specialist

Specialist in External and Internal Communications

Senior Customer Relationship Manager

Intern in the B2C IT Team

Systems Analyst GigaChat

Sales Department Client Manager

Prompt Engineer GigaChat

AI Trainer with English language knowledge (GigaChat)

GigaChat AI Trainer (with knowledge of English)

SRE Engineer

Key Skills

Details

Average salary for this role