Era4 develops, owns and operates AI infrastructure across the UK, powered by renewable energy. Converting legacy industrial and energy sites into modern data-centre facilities, Era4 is combining brownfield regeneration opportunities with cleaner, efficient, scalable compute capacity for healthcare, research, finance, enterprise, and public-sector organisations
Role Summary:
We are seeking a Technical Product Manager – AI Cloud Infrastructure to join our fast-scaling team. In this role, you will embed with engineering to act as the "First Customer," owning the continuous validation, reliability strategy, and technical documentation for our bare-metal, VM, Kubernetes, and ML infrastructure. By treating testability as a core feature and shadowing real-world workflows, you will ensure our compute platform handles the demands of advanced AI training and engineering workloads. This is an opportunity to join a mission-led AI business that is redefining infrastructure, intelligence, and impact for enterprise customers.
Key Responsibilities:
- Execute integration testing in staging environments, work closely with the platform engineers to build repeatable test frameworks, and shadow internal and external AI infrastructure engineers to translate their real-world usage patterns into automated in-house test cases.
- Establish strict quality gates, performance SLOs, and scheduling benchmarks that our compute and orchestration services must pass before production deployment.
- Review, refine, and author technical guides, API documentation, and CLI guides, using them as the blueprint to test the platform exactly as an external engineer would.
- Partner with software and platform engineers to design robust validation suites, anticipating complex edge cases and structural failure modes across bare-metal provisioning and Kubernetes cluster lifecycles.
Essential Experience:
- Technical familiarity with bare-metal infrastructure (e.g., PXE booting, IPMI/Redfish), virtualization layers (e.g., KVM), and container orchestration (Kubernetes or similar).
- Track record designing comprehensive test strategies, validation frameworks, and acceptance criteria for highly technical cloud-native, API, or infrastructure-as-a-service (IaaS) products.
- Analyse infrastructure services, CLIs, and APIs from a developer’s perspective to identify friction points, usability gaps, and reliability risks.
- Working knowledge of modern CI/CD pipelines, automated testing, and automation tooling (e.g., GitLab CI, GitHub Actions, Terraform, Ansible) to help engineering shape automated quality gates.
- Proven experience in a highly technical role embedded directly within a core infrastructure or platform engineering team.
One or more would be an advantage:
- Direct exposure to high-performance computing (HPC) setups, large-scale cluster scheduling (e.g., Slurm), or infrastructure optimized for heavy AI/ML training workloads.
- Experience using cloud observability, telemetry, and monitoring tools (e.g., Prometheus, Grafana, Datadog) to track and improve system reliability metrics.
- Experience writing or structuring technical documentation, API reference guides, and developer tutorials from scratch.
Why Join Era4:
You’ll be joining a mission-driven start-up building critical national infrastructure, where operational excellence directly enables growth. This role offers high visibility with leadership, real autonomy, and the chance to shape how a next-generation company operates at scale.
Diversity & Inclusion:
Era4 is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

