- Collaborate with cross-functional teams to ensure the reliability, availability, and performance of our client-facing services
- Maintain and configure observability platforms such as Datadog
- Proactive monitoring of production and other environments to ensure stability, availability, security and integrity
- Design and implement automation and processes to improve the efficiency and effectiveness of the teams and other support functions
- Engage with business stakeholders to gather requirements, address concerns, and provide updates on projects and system status
- Contribute to the design, build and operational management of the services
- Lead incident response, troubleshooting, and root cause analysis to mitigate and prevent future issues
- Work closely with engineering, support and operations teams to upskill and promote knowledge transfer, producing training materials and articles
- Participate in on-call rotation to provide support and ensure system uptime
- Solid experience in Site Reliability Engineering or a similar role such as DevOps
- Experience of running 24x7 services in a public cloud, ideally Azure
- Deep understanding of cloud infrastructure and services, including best practices for monitoring, scaling, and security
- Experience with observability platforms such as Datadog or similar tools
- Strong interpersonal skills, with the ability to work effectively with many stakeholders
- Solid verbal and written communication skills, and the ability to present technical information clearly and concisely
- Previous experience working with external clients is needed
- Experience with conducting Post-mortems or Post Incident Reviews
- Confidence in making decisions and taking ownership of projects
- Experience with Azure DevOps pipelines (or similar) and scripting languages, such as Python or PowerShell
- Customer centric, passionate about delivering great services
- You’re collaborative, enjoy problem solving and mentoring others
- Azure certifications, such as Azure Administrator, Azure Developer, or Azure DevOps Engineer
- Familiarity with Infrastructure as Code (IaC) tools like Pulumi, Terraform, ARM Templates, or AzureBicep
- Knowledge of containerization and orchestration technologies, such as Docker and Kubernetes
- Familiarity with programming languages such as C# would be welcome
- Previous experience working with Configuration as Code technologies such as Puppet or Ansible
- Familiar with high volume Web APIs
- Familiar with PagerDuty