Job Description
Responsibilities:
- Designing and implementing core platform components with an emphasis on reliability, scalability, and operational safety.
- Building and maintaining cloud-native infrastructure, including networking, compute, and service orchestration.
- Owning deployment workflows, warm-up processes, rollout strategies, and rollback mechanisms for production environments.
- Establishing and maintaining platform standards for monitoring, alerting, logging, and incident response.
- Developing and enforcing guardrails to reduce operational risk from abuse, misconfiguration, or traffic anomalies.
- Defining and tracking SLIs, SLOs, and KPIs related to uptime, latency, and platform health.
- Using AI-assisted engineering tools (Cursor, Claude Code, etc.) to improve development velocity and operational insight.
- Documenting architecture, operational procedures, and known failure scenarios.
Requirements:
- 5+ years of experience in platform engineering, infrastructure engineering, or SRE-adjacent roles.
- 5+ years of experience with Python.
- Solid experience working with GNU/Linux and Bash / Shell scripting.
- Strong experience operating distributed systems in production.
- Hands-on expertise with at least one major cloud provider (AWS, GCP, Azure).
- Practical experience with:
- Containerization and orchestration (Docker, Kubernetes, Nomad, etc.);
- Infrastructure as Code (Terraform, Pulumi, CloudFormation);
- CI/CD systems and automated release pipelines.
- Proven ability to design for high availability, fault tolerance, and low-latency systems.
- Experience building or operating platforms with strict uptime and reliability requirements.
Nice to Have:
- Experience with Go.
- Experience with proxies, load balancers, or edge platforms.
- Background in abuse prevention, rate limiting, or reputation-based control systems.
- Familiarity with observability stacks (Prometheus, Grafana, OpenTelemetry, etc.).
Benefits:
- A focus on professional development.
- Interesting and challenging projects.
- Fully remote work with flexible working hours, that allows you to schedule your day and work from any location worldwide.
- Paid 24 days of vacation per year, 10 days of national holidays, and unlimited sick leaves.
- Compensation for private medical insurance.
- Co-working and gym/sports reimbursement.
- Budget for education.
- The opportunity to receive a reward for the most innovative idea that the company can patent.
Who You Are:
Strong fundamentals in distributed systems and cloud platform engineering are essential.
Tech Stack:
Python, GNU/Linux, Bash/Shell scripting, AWS/GCP/Azure, Docker, Kubernetes, Terraform, CI/CD systems.
Team Description:
You will join the Imunify360 Email Team, which is building a smart, automated email protection system designed to ensure clean outbound traffic and preserve IP reputation for hosting providers.