Senior Site Reliability Engineer (Remote Build) | RocketHunt
Contacts
Reach out directly about this role
What you’ll do
Infrastructure as code at scale. Design, implement, and maintain infrastructure-as-code patterns using Terraform and Kubernetes that support both standard connectors and custom builds. Make it easy for engineers to deploy and operate with confidence.
Observability and incident response. Build and maintain comprehensive monitoring, logging, and alerting systems. Lead incident response efforts, conduct post-mortems, and drive continuous improvement in system reliability.
Security and compliance in motion. Work with our Security team to embed security into every layer of Build infrastructure. Ensure we meet compliance requirements across 100+ jurisdictions without creating friction for developers or customers.
Performance and cost optimisation. Continuously optimize system performance, resource utilization, and cloud costs. Make recommendations that improve both reliability and unit economics.
Automation and operational leverage. Identify manual operational toil and systematically eliminate it. Build tools and processes that let teams operate efficiently without scaling headcount.
Platform reliability and developer experience. Partner with platform teams to ensure APIs, MCP, and CLI are resilient and observable. Give infrastructure feedback that shapes how the platform evolves.
What you’ll bring
Senior-level SRE experience: demonstrated experience in a Site Reliability Engineering, DevOps Engineering, or SysOps role. You have stood up and operated production systems at scale.
Kubernetes and AWS: deep, hands-on experience running Kubernetes in production. Solid AWS fundamentals across compute, networking, storage, and managed services.
Infrastructure-as-code: Proficiency with Terraform or similar IaC tools. You write code to define infrastructure; you don't click buttons in the console.
CI/CD and deployment automation: real experience setting up and operating GitLab, GitHub Actions, Jenkins, or similar. You understand deployment strategies, rollback mechanisms, and safety nets.
Scripting and systems knowledge: strong bash scripting. Comfortable debugging system-level issues, reading logs, and understanding Linux kernel basics.
Great communication: you explain complex infrastructure decisions clearly to both engineers and non-technical stakeholders. You write clear runbooks and documentation.
Nice to have:
Experience with 1+ backend programming language (Elixir, Python, Go, Java, Node.js, etc.).
Experience in consultancy settings.
Container registry and artifact management (ECR, Docker Hub, etc.).
Observability stack depth (Datadog, Prometheus, ELK, Grafana, or similar).
Experience working with or scaling multi-tenant platforms.
Practicals
You’ll report to: Engineering Manager
Team: Engineering
Location: Anywhere in the world
Travel: Around 10% of travel for customer engagements
Start date: As soon as possible
Benefits
Work from anywhere
Flexible paid time off
Flexible working hours (async)
16 weeks paid parental leave
Mental health support services
Stock options
Learning budget
Home office budget & IT equipment
Budget for local in-person social events or co-working spaces