Job Title: Site Reliability Engineer
Location: Remote (UK)
Type: Full-Time (1-Year Contract)
Working Hours: 11 AM - 7 PM
Are you passionate about building and managing reliable, large-scale cloud systems? We're looking for a Senior Site Reliability Engineer to join a high-performing Observability team. In this role, you'll play a critical part in ensuring our cloud services remain performant and scalable, supporting billions of daily requests.
Key Responsibilities
- Scale and optimize Prometheus architecture to manage millions of active metrics.
- Operate and maintain large ElasticSearch clusters (2000TB+).
- Build and manage high-throughput Kafka pipelines processing hundreds of thousands of events per second.
- Develop self-service APIs, robust alerting systems, and deploy infrastructure with Terraform.
- Support observability initiatives to monitor and improve critical cloud services.
What We're Looking For
- 5+ years of experience managing distributed systems on Linux (Debian/Ubuntu preferred).
- 2+ years of development experience with Ruby, Python, Go, or similar languages.
- Expertise in technologies such as ElasticSearch, Kafka, Prometheus, Terraform, Ansible, and more.
- A strong passion for solving complex challenges in large-scale distributed systems.
- A proactive, curious mindset with a focus on quality and customer experience.
This is an urgent vacancy where the hiring manager is shortlisting for an interview immediately. Please apply with a copy of your CV or send it raghav. Manrai @ randstad .co .uk
Randstad Technologies is acting as an Employment Business in relation to this vacancy.