NexGen Cloud Logo

NexGen Cloud

HPC Cluster Architect

Reposted 7 Days Ago
Be an Early Applicant
Remote
Hiring Remotely in UK
Senior level
Remote
Hiring Remotely in UK
Senior level
The HPC Cluster Architect will design and deploy large-scale GPU clusters, manage customer requirements, and lead technical integrations for high-performance computing environments.
The summary above was generated by AI
HPC Cluster Architect

Location: UK (Remote)

Department: Infrastructure

Reporting to: Head of Infrastructure

ABOUT NEXGEN CLOUD:

NexGen Cloud is the company behind Hyperstack, a full-stack AI cloud serving tens of thousands of customers from AI researchers to enterprises running the world's most compute-intensive workloads. We deliver on-demand and private GPU infrastructure to teams who treat performance as a requirement, not a feature.

We're a tight-knit, fast-moving team working at the cutting edge of AI cloud infrastructure. We practice what we preach, equipping our people with AI at every level so we can solve harder problems, ship faster, and keep raising the bar for what enterprise GPU infrastructure looks like.

THE ROLE: HPC Cluster Architect

This role exists because NexGen Cloud is winning large-scale dedicated GPU cluster contracts and needs someone who can own the full architecture cycle — from first customer conversation to production deployment. You'll have direct ownership over cluster architecture across compute, networking, storage, and physical design — translating customer requirements into production-ready, commercially optimised GPU deployments.

This is a senior hands-on role for someone who has lived and breathed HPC cluster design and wants to be the technical authority, not one voice in a committee. You'll own designs end-to-end and see them go live.

WHAT YOU'LL BE DOING:

Rather than a long checklist, here's what success in this role looks like:

  • Own end-to-end cluster architecture for large-scale NVIDIA GPU deployments — from customer requirement through rack layouts, BOM, power and cooling design, to production handover
  • Design high-performance network fabrics across compute (InfiniBand, RDMA, NVLink/NVSwitch), storage, and WAN — defining topology, oversubscription models, and scaling strategies
  • Engage directly with OEMs and vendors — validating hardware configurations, reviewing quotes, and ensuring designs are both technically sound and commercially optimised
  • Provide technical oversight during deployment and bring-up — supporting hardware validation, performance testing, and acting as escalation point for complex integration issues
  • Act as a senior technical leader across Solutions Architecture, Cloud Engineering, and data centre partners — contributing to standardised reference designs and building out the HPC engineering function
ABOUT YOU:

We're more interested in how you think and work than in a perfect CV. You'll likely bring a combination of the following:

Essential
  • Proven experience designing and delivering GPU-based HPC or AI clusters at scale — covering the full lifecycle from design through procurement, deployment, and validation
  • Deep hands-on knowledge of NVIDIA GPU platforms (H100/H200/B-series) and NVIDIA reference architectures
  • Strong InfiniBand/RDMA design experience — topology, performance tuning, and high-performance Ethernet fabrics
  • Solid grounding in Linux systems, PCIe topology, NUMA alignment, and server-level performance considerations
  • Background from an OEM, hyperscaler, neo-cloud, or enterprise/research HPC environment — with demonstrable exposure to the full design-to-deployment lifecycle
  • Confident engaging with customers, vendors, OEMs, and internal engineering teams as a technical authority — able to translate complex design trade-offs into clear decisions
Nice to Have
  • Experience with Spectrum-X or next-generation Ethernet fabrics
  • Prior involvement in large-scale cluster deployments (1,000+ GPUs) and performance benchmarking (NCCL, MLPerf)
  • Exposure to both air-cooled and liquid-cooled HPC environments, and/or automation/infrastructure-as-code
WHAT WE OFFER:
  • Competitive salary and annual discretionary bonus scheme
  • Employee wellbeing benefits
  • 25 days of holiday, plus public holidays
  • Flexible working arrangements (remote or hybrid, depending on role and location)
  • Real ownership and autonomy, with the trust to take initiative and experiment
  • The opportunity to make a visible, meaningful impact as we scale
  • Clear career progression and growth opportunities in a fast-growing company
  • A collaborative, international culture built on trust, transparency, and ownership
  • The chance to help shape NexGen Cloud's team, culture, and future alongside ambitious, mission-driven colleagues
MORE INFORMATION

Head over to our NexGen Cloud careers page to view current openings and follow us on LinkedIn and X to learn more about our journey, newest releases and hear exciting news in the neocloud space.


HQ

NexGen Cloud London, England Office

24 Greville St, London, United Kingdom, EC1N 8SS

Similar Jobs

2 Hours Ago
Remote or Hybrid
Senior level
Senior level
Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Lead industry-focused marketing strategy and execution to drive revenue and brand visibility. Conduct market research and performance analysis, develop integrated campaigns across channels, use CRM and analytics for audience engagement, create reporting frameworks, and coach teams while managing cross-functional stakeholder relationships.
Top Skills: Crm Software
2 Hours Ago
Remote or Hybrid
Senior level
Senior level
Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
Lead data engineering efforts within Technology Consulting: design data architecture and pipelines, implement AWS/Redshift and ETL solutions, support BI (QlikView/Oracle BI), coach teams, manage client relationships and SLAs, apply systems thinking to optimize outcomes and validate solutions with stakeholders.
Top Skills: AWSDatastageDb2ETLJavaManaged ServicesOracle BiPythonQlikviewRedshiftSlasSQL ServerWorkload Orchestration And Scheduling
2 Hours Ago
Remote
United Kingdom
Senior level
Senior level
Cloud • Information Technology • Productivity • Security • Software • App development • Automation
Own retention and expansion for strategic Enterprise customers in the DACH region. Drive renewals, upsell/cross-sell, executive relationships, account planning, forecasting, risk mitigation, and partner with Sales and cross-functional teams to grow total book of business.
Top Skills: ClariSalesforceTableau

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account