SRE Engineer
TechVill – an IT company and a partner of VkusVill for digital solutions development.
We are responsible for the development of mobile and web applications, business process automation, artificial intelligence, DevOps, and VkusVill's information security.
Our solutions are used by over 1,000,000 VkusVill customers and employees.
Responsibilities:
- Support of services and development teams from the infrastructure side;
- Ensuring the reliability and scalability of systems;
- Identifying and eliminating performance bottlenecks;
- Configuring monitoring, logging, and tracing systems;
- Preventing potential failures;
- Optimizing CI/CD pipelines, implementing Infrastructure as Code (IaC), and automating routine tasks;
- Promoting DevOps practices towards development: implementing DevOps best practices such as SLA, SLO, SLI monitoring, incident analysis (postmortem), and change management;
- Participating in the creation and development of infrastructure platforms;
- Ensuring the security, reliability, fault tolerance, and rapid recovery of the platform after failures.
Requirements:
- Practical experience in administering and supporting Linux family information systems (Debian, Ubuntu, Rocky);
- Proficiency in bash or python as tools for automating routine activities;
- Practical experience using container orchestration systems (Kubernetes, Docker compose);
- Practical experience working with containers (Docker), knowledge of Dockerfile fundamentals and best practices in this area;
- Proficiency in configuration management systems (Ansible, Terraform, Pulumi) and practical experience applying such systems in IaC (Infrastructure as Code) development processes;
- Application of GitLab CI and Jenkins tools in building build and delivery processes;
- Practical experience in applying and administering;
- Monitoring systems based on Prometheus, Zabbix, Grafana Stack, Alertmanager, VictoriaMetrics;
- Practical experience interacting with event streaming systems (Kafka, RabbitMQ);
- Knowledge of Agile methodologies, experience working with ticket systems (Atlassian Jira, Yandex Tracker, etc.) and documentation storage systems (Atlassian Confluence, Evawiki, and other Wikis);
- Practical experience operating web servers and load balancers (Nginx, HAProxy, Traefik, APISIX);
- Practical experience administering relational database management systems (PostgreSQL, Greenplum, MySQL), clustering based on Patroni, as well as column-oriented DBMS ClickHouse;
- Practical experience using NoSQL and Key-Value systems (Elasticsearch, OpenSearch, etcd, Redis, Memcached);
- Practical experience using centralized log collection and storage systems based on stacks: Logstash, Fluent Bit, Vector, Graylog, Loki;
- Practical experience using object storage systems based on S3 (MinIO), as well as tools for accessing them;
- Skills in working with cloud systems (Amazon Web Services, Google Cloud Services, or Yandex Cloud) and their management systems;
- Practical experience in orchestrating data processing pipelines based on Apache Airflow;
- Experience deploying and supporting JupyterHub;
- Proficiency in distributed data processing tools (Apache Spark, Spark Streaming);
- Experience working with Iceberg REST Catalog as a tabular data catalog;
- Practical experience using MLflow for experiment tracking, model registration, and lifecycle management;
- Practical experience organizing GPU computing in Kubernetes (including installing NVIDIA GPU Operator, configuring drivers, monitoring, and planning GPU resources);
- Familiarity with vector databases (Qdrant) as part of an ML platform;
- Familiarity with the n8n process automation platform.
Conditions:
- Work in an accredited IT company.
- Remote work: we value work results regardless of location;
- Official employment from the first day of work and curator support during adaptation.
- Transparent development system: clear grades, internal and external training, individual development plans, and competency matrices.
- Ecologically friendly culture and adequate managers.
- Compensation for expenses on medical services, mental well-being, sports, team building events, and the use of AI assistants.
- 15% bonus on purchases at VkusVill.
- Social responsibility: we encourage donation, provide financial assistance upon the birth of a child.
- Partner program "Green Light": for referring specialists you know, you can receive up to 50,000 rubles.