System Administrator Middle / Senior
We are Timeweb Cloud, a cloud solutions provider with a geographically distributed infrastructure. We develop a full-fledged IaaS/PaaS platform that includes Managed Kubernetes services, DBaaS, S3 object storage, Load Balancing as a Service (LBaaS), and virtual private networks.
We are looking for a system administrator who will handle non-standard server configurations, investigate complex incidents, and automate everything that moves.
Role focus — cloud infrastructure: virtualization, networking (overlay/underlay), performance, platform-level incidents.
What you will do:
- Investigate complex problems
- Analyze logs, traces, and metrics to find root causes of incidents in any part of the cloud.
- Diagnose problems at the level of Python services and scripts (understanding the code is sufficient, active development is not required).
- Work with the Linux kernel, network stack to find bottlenecks.
- Design and install non-standard configurations
- Deploy distributed systems in production and under live load.
- Optimize the performance of virtualization hosts and network dataplane components (NUMA, IRQ/RPS, I/O, schedulers).
- Participate in architectural design of new solutions with the team.
- Write scripts in Bash and Python to automate software installation and server configuration.
- Work with configuration management systems (SaltStack / Ansible).
- Configure and develop monitoring (Zabbix and other systems).
- Participate in on-call rotations (approximately one week per month, oncall).
- Assist in resolving emergency situations when standard tools and junior teams are insufficient.
We expect from you:
- Deep understanding of Linux operation: cgroups, namespaces, network stack, systemd, processes, and initialization system.
- Understanding of processor topology, including NUMA, and the ability to consider it when configuring performance.
- Ability to investigate problems at the kernel and system call level: from analyzing logs and strace to finding bottlenecks.
- Understanding of disk I/O, file systems, volume managers, and schedulers.
- Experience with QEMU/KVM and libvirt.
- Understanding the differences between virtual machines and containers.
- Understanding of VLAN, VXLAN, BGP.
- Experience with OpenvSwitch, understanding of OpenFlow.
- Experience with SaltStack and/or Ansible.
- Configuration and support of monitoring (Zabbix or similar).
- Python at the code reading level (understanding what is happening in services and scripts).
- Confident Bash skills.
Will be a huge plus:
- Basic understanding of Ceph: what it is, how S3 and RBD disks for virtual machines are built on it.
- Experience with SAN/NAS.
- SDN and virtualization
- Experience with OpenStack / OpenNebula / oVirt.
- Familiarity with SDN solutions (OVN, Tungsten Fabric).
Our stack:
- OS and virtualization: Linux, QEMU, KVM, libvirt
- Networking: VLAN, VXLAN, BGP, OpenvSwitch, OpenFlow, FRR
- Automation: SaltStack, Ansible, Bash, Python
- Monitoring: Zabbix, Prometheus, Victoria Metrics
- Storage: Ceph
- CI/CD: GitLab CI/CD
What we offer:
- Official employment in an accredited IT company with all the ensuing benefits;
- Voluntary Medical Insurance (DMS);
- Sports compensation;
- Psychological counseling compensation;
- 8 additional days off per year - 2 each quarter;
- Remote work format or work in the office near Moskovskiye Vorota metro station (St. Petersburg) with the possibility of hybrid work - at your choice.
Additional benefits for those working in our St. Petersburg office:
- Office doctor visits;
- Meal compensation via "NaLunch";
- Office library, opportunity to learn from colleagues from other departments and divisions.
- Office kitchen with unlimited coffee, tea, fruits, and snacks.