SonicJobs Logo
Left arrow iconBack to search

HPC Support Engineer

Hays Specialist Recruitment Limited
Posted 10 hours ago, valid for 14 days
Location

Bournemouth, Dorset BH89BJ, England

Contract type

Full Time

In order to submit this application, a Reed account will be created for you. As such, in addition to applying for this job, you will be signed up to all Reed’s services as part of the process. By submitting this application, you agree to Reed’s Terms and Conditions and acknowledge that your personal data will be transferred to Reed and processed by them in accordance with their Privacy Policy.

Sonic Summary

info
  • The company is seeking an HPC Support Engineer with over 5 years of experience in IT support, particularly in HPC or cloud environments.
  • This fully remote position offers an unlimited holiday policy and the opportunity to work on a cutting-edge GPU cloud platform.
  • Key responsibilities include resolving complex issues, managing incidents related to storage and networking, and monitoring multi-node clusters.
  • Candidates should have strong Linux system administration skills, networking knowledge, and experience with cluster management tools like SLURM.
  • Compensation details were not specified, but the role includes share options and a flexible workplace culture.

Your new companyI have partnered exclusively with a trailblazing company that's revolutionising cloud infrastructure. Their cutting-edge, high-performance, and GPU-optimized platform is driving breakthroughs in AI and HPC while championing sustainability for a greener, more efficient world.This role is fully remote, with no office visits required - ever! Plus, you'll enjoy the fantastic perk of unlimited holiday, giving you the freedom to recharge and thrive.Your new roleAs an HPC Support Engineer, you will support customers on a GPU cloud platform and dedicated GPU clusters. You'll work with various teams, vendors, and partners to meet SLA commitments and ensure smooth operations. Your main tasks will include resolving complex issues, managing incidents related to storage, networking, and GPU optimisation. You'll also monitor multi-node clusters to ensure they run efficiently, keep detailed records of incidents and resolutions and collaborate with stakeholders.What you'll need to succeed

  • IT Support Background: 5+ years of experience in an IT support role, preferably in HPC or cloud environments.
  • Linux system administration from the Command Line.
  • Networking Knowledge: Familiarity with network protocols (e.g. TCP/IP, BGP), Infiniband, and RoCE.
  • Cluster Support: Experience working with cluster management tools like SLURM and GPU monitoring systems (e.g. NVIDIA DCGM).
  • Scripting and Automation: Proficiency in scripting languages (Bash, Python etc.).
  • Tools and Platforms: Familiarity with ITSM tools (e.g. ServiceNow, Jira Service Management) and monitoring solutions (e.g. Grafana, Prometheus).
  • Knowledge of NVIDIA AI Enterprise Suite and software stacks relevant to GPU environments.

What you'll get in return

  • Share options.
  • Unlimited holiday policy.
  • 100% Remote working.
  • Fantastic opportunities to develop - they make a habit of promoting in house.
  • A great team with a passion for working collaboratively.
  • Enhanced family friendly policies.
  • A truly flexible workplace.

What you need to do nowIf you're interested in this role, click 'apply now' to forward an up-to-date copy of your CV, or call us now.If this job isn't quite right for you, but you are looking for a new position, please contact us for a confidential discussion about your career.

Hays Specialist Recruitment Limited acts as an employment agency for permanent recruitment and employment business for the supply of temporary workers. By applying for this job you accept the T&C's, Privacy Policy and Disclaimers which can be found at hays.co.uk

Apply now in a few quick clicks

In order to submit this application, a Reed account will be created for you. As such, in addition to applying for this job, you will be signed up to all Reed’s services as part of the process. By submitting this application, you agree to Reed’s Terms and Conditions and acknowledge that your personal data will be transferred to Reed and processed by them in accordance with their Privacy Policy.