Principal Site Reliability Engineer

Sorry, this job was removed at 04:18 p.m. (GMT) on Tuesday, Sep 09, 2025

Be an Early Applicant

In-Office or Remote

Hiring Remotely in London, England

In-Office or Remote

Hiring Remotely in London, England

Similar Jobs

Coupa

Sr. Digital Solution Advisor - 11000

5 Hours Ago

Remote

United Kingdom

Mid level

Artificial Intelligence • Fintech • Information Technology • Logistics • Payments • Business Intelligence • Generative AI

The role involves implementing demo strategies, creating video content for solutions, and training sales teams on using demo platforms.

Top Skills: CamtasiaConsensusGraphic Design SoftwareSaaSVideo Editing Software

Atlassian

Senior Onboarding Success Manager

11 Hours Ago

Remote

United Kingdom

Senior level

Cloud • Information Technology • Productivity • Security • Software • App development • Automation

As a Senior Onboarding Success Manager, you will drive customer onboarding, ensure product adoption, build relationships, and guide customers through transformations while collaborating across teams.

Top Skills: Bi ToolsConfluenceGainsightJIRALoomSalesforceTableau

Motorola Solutions

Sales Operations Manager

13 Hours Ago

Remote or Hybrid

United Kingdom

Mid level

Artificial Intelligence • Hardware • Information Technology • Security • Software • Cybersecurity • Big Data Analytics

The International Sales Operations Manager supports the sales team by managing forecasts, analyzing sales data, improving processes, and providing training, while coordinating with various departments to drive growth and efficiency.

Top Skills: Advanced ExcelAi SkillsGoogle DocsGoogle SheetsGoogle SlidesSalesforceTableauZoho

Description

Orgvue is a leading organizational design and planning software platform that captures the power of data visualization and modelling to build more adaptable, and better performing organizations. HR, finance and business leaders use Orgvue for actionable insight and analysis that helps them make faster workforce decisions in a constantly changing world.

Orgvue is used by the world’s largest and best-known enterprises and management consulting firms to visualize and confidently build the businesses they want tomorrow, today. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney.

We are seeking a Principal Site Reliability Engineer who will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure.

Role

In this role you will work across product, platform, and operations teams to ensure our systems are reliable, observable, and resilient, even at scale.

This role combines hands-on technical capability with strategic vision, helping us build a world-class reliability culture and a robust engineering foundation for growth. We're looking for someone who has technical expertise, is a great communicator and enjoys collaborating across multiple teams.

Responsibilities

Define and enforce SLOs, SLIs, and error budgets across critical services
Crafting and implementing a cloud infrastructure and tooling strategy
Work across our Org to level up SRE practices
Help implement robust observability metrics, logs & traces using our observability tool
Guide the team in building automated, self-healing systems
Own and evolve our incident response processes, including on-call practices and post-mortem culture
Mentor engineers across the org on best practices in reliability, operational readiness, and scalable infrastructure
Drive Infrastructure as Code (IaC) using Terraform, Kubernetes, CloudFormation and GitOps practices
Collaborate closely with security, DevOps, and software teams to ensure compliance, scalability, and operational excellence
Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform

Requirements

Demonstrable experience leading SRE transformations
Deep hands-on expertise with Kubernetes (EKS preferred) in production environments
Strong experience with AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, CloudWatch, etc.)
Expert in Infrastructure as Code using tools such as Terraform, with knowledge of GitOps workflows
Strong background in observability: metrics, visualization, logging, and tracing
Understanding of automation, SDLC, CI/CD pipelines, deployment automation, and blue/green or canary releases
Proven experience with incident management, disaster recovery planning, root cause analysis, and post-incident reviews

Benefits

Hybrid working - 1+ days a week in the London office

Wellbeing: Sanctus Coaching, Virtual fitness sessions, Wellbeing webinars, Annual Wellbeing day

Subsidised Gym Membership
Private Medical Insurance (including Dental and Vision) and Life Assurance
25 days holiday (increasing to 30 days at a rate of 1 extra day per year)
Summer Fridays (half-day Fridays for the months of July and August)
Employer pension contribution of 5% of your gross salary, if you contribute a minimum of 3%
Season ticket Loan
Cycle to Work Scheme
Annual Discretionary Bonus

'Here at Orgvue we promote individualism and a diverse workforce to build on our future success'

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

OrgVue

Principal Site Reliability Engineer

Similar Jobs

Sr. Digital Solution Advisor - 11000

Senior Onboarding Success Manager

Sales Operations Manager

What you need to know about the London Tech Scene