SonicJobs Logo
Left arrow iconBack to search

Senior IT Engineer

MicroTECH Global Ltd
Posted 11 hours ago, valid for 8 days
Location

London, Greater London EC1R 0WX

Salary

£60,000 - £72,000 per annum

info
Contract type

Full Time

In order to submit this application, a Reed account will be created for you. As such, in addition to applying for this job, you will be signed up to all Reed’s services as part of the process. By submitting this application, you agree to Reed’s Terms and Conditions and acknowledge that your personal data will be transferred to Reed and processed by them in accordance with their Privacy Policy.

Sonic Summary

info
  • Our client, a global telecommunications company, is seeking a Senior IT Engineer for their AI Infrastructure Team, requiring 5+ years of experience.
  • The position involves managing large-scale AI development infrastructure, specifically GPU servers, Kubernetes clusters, and storage systems.
  • Responsibilities include configuring and maintaining Kubernetes clusters, managing GPU resources, and implementing CI/CD pipelines for automation.
  • Candidates should have proven experience with Kubernetes, GPU optimization, storage management, and familiarity with deep learning frameworks like TensorFlow and PyTorch.
  • The salary for this role is competitive, and the position is 100% on-site with no sponsorship available.

100% On-Site Required // No Sponsorship Available

Our client are a global telecommunication company within their AI Infrastructure Team.

Brief:

We are looking for a highly skilled Senior IT Engineer to manage a large-scale AI development and training infrastructure.

The role involves overseeing GPU servers, Kubernetes clusters (Rancher), and storage systems to ensure seamless operations and optimized performance. You will collaborate with development teams, ensuring they have the resources and support needed to run their projects efficiently.

This is a critical technical position requiring expertise in Kubernetes, hardware management, automation

Responsibilities:

Kubernetes and Rancher Management: Configure, scale, and maintain Kubernetes clusters and Rancher for multi-cluster management, ensuring optimal performance and resource allocation.

GPU Resource Management: Manage GPU resources and servers, ensuring efficient resource scheduling, load balancing, and performance optimization for AI workloads.

Storage Management: Maintain and optimize large storage systems, ensuring high availability, performance, and data persistence.

DevOps and Automation: Implement CI/CD pipelines and automate infrastructure management using tools such as Terraform, Ansible, Jenkins, and GitLab CI.

Monitoring and Troubleshooting: Set up and manage monitoring and logging systems (e.g., Prometheus, Grafana, ELK) to ensure high availability and rapid issue resolution.

AI Framework Optimization: Collaborate with data scientists and AI developers to optimize AI frameworks (e.g., TensorFlow, PyTorch) for GPU and cluster environments.

Security and Access Management: Implement and manage role-based access control (RBAC) and ensure data security, encryption, and backup procedures are in place.

Key Requirements:

Proven experience in managing large-scale Kubernetes clusters and containerisation technologies (e.g., Docker).

Strong understanding of GPU resource management and optimization for AI workloads.

Expertise in managing large storage systems and implementing data persistence strategies.

Proficiency in scripting and automation (Python, Bash, Go), with experience in infrastructure as code (IaC) using Terraform, Ansible, or similar tools.

Familiarity with deep learning frameworks (e.g., TensorFlow, PyTorch) and experience optimizing them for large-scale environments.

Experience with monitoring and logging tools such as Prometheus, Grafana, and ELK.

Desirables:

Experience with Rancher or other Kubernetes management platform Experience in managing hybrid cloud environments Preferred Red Hat Certified System Administrator (RHCSA) Preferred Certified Kubernetes Administrator (CKA) Preferred Mandarin Speaker.

Apply now in a few quick clicks

In order to submit this application, a Reed account will be created for you. As such, in addition to applying for this job, you will be signed up to all Reed’s services as part of the process. By submitting this application, you agree to Reed’s Terms and Conditions and acknowledge that your personal data will be transferred to Reed and processed by them in accordance with their Privacy Policy.