Job Title: Site Reliability Engineer
Location: Remote (UK)
Type: Full-Time (1-Year Contract)
Working Hours: 11 AM - 7 PM

Are you passionate about building and managing reliable, large-scale cloud systems? We're looking for a Senior Site Reliability Engineer to join a high-performing Observability team. In this role, you'll play a critical part in ensuring our cloud services remain performant and scalable, supporting billions of daily requests.

Key Responsibilities

Scale and optimize Prometheus architecture to manage millions of active metrics.
Operate and maintain large ElasticSearch clusters (2000TB+).
Build and manage high-throughput Kafka pipelines processing hundreds of thousands of events per second.
Develop self-service APIs, robust alerting systems, and deploy infrastructure with Terraform.
Support observability initiatives to monitor and improve critical cloud services.

What We're Looking For

5+ years of experience managing distributed systems on Linux (Debian/Ubuntu preferred).
2+ years of development experience with Ruby, Python, Go, or similar languages.
Expertise in technologies such as ElasticSearch, Kafka, Prometheus, Terraform, Ansible, and more.
A strong passion for solving complex challenges in large-scale distributed systems.
A proactive, curious mindset with a focus on quality and customer experience.

This is an urgent vacancy where the hiring manager is shortlisting for an interview immediately. Please apply with a copy of your CV or send it raghav. Manrai @ randstad .co .uk

Randstad Technologies is acting as an Employment Business in relation to this vacancy.

Senior Site Reliability Engineering

Senior Civil Engineer

Principal Civil Engineer

Civil Engineer