xAI Logo

xAI

Site Reliability Engineer (SRE)

Reposted 14 Days Ago
Be an Early Applicant
Easy Apply
In-Office
London, Greater London, England
Expert/Leader
Easy Apply
In-Office
London, Greater London, England
Expert/Leader
Responsible for backend services at xAI, focusing on scalability and reliability, requiring expertise in Kubernetes and monitoring technologies.
The summary above was generated by AI
About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the team

You will work on the team that is responsible for the backend services that power our products such as grok.com and the API. We focus on writing and maintaining highly scalable and reliable services that can efficiently process tens of thousands of queries per second. The services are hosted on a number of Kubernetes clusters (on-prem & cloud).

About the role

An ideal candidate meets at least the following requirements:

  1. Expert knowledge of Kubernetes,
  2. Expert knowledge of continuous deployment systems such as Buildkite and ArgoCD,
  3. Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty,
  4. Expert knowledge of infrastructure as code technologies such as Pulumi or Terraform.
  5. Experience with traffic management and HTTP proxies such as nginx and envoy.
Location

This position is in-person in London, UK. We usually work from the office 5 days a week but allow for work-from-home days when required. Candidates must be willing to attend late meetings at least once a week to coordinate with the rest of our team in Palo Alto.

Interview process

After submitting your application, the team reviews your statement of exceptional work and CV. If your application passes this stage, you will be invited to a 15 minute interview (“phone interview”) during which a member of our team will ask some basic technical questions. If you clear the initial phone interview, you will enter the main process, which consists of at least two technical interviews.

All interviews will be conducted via Google Meet.

Benefits

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to our Aviva pension plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

xAI is an equal opportunity employer.

California Consumer Privacy Act (CCPA) Notice

Top Skills

Argocd
Buildkite
Grafana
Kubernetes
Pagerduty
Prometheus
Pulumi
Terraform

xAI London, England Office

20 Air Street, London, London, United Kingdom

Similar Jobs

8 Days Ago
Hybrid
Bournemouth, Dorset, England, GBR
Senior level
Senior level
Financial Services
As a Lead Site Reliability Engineer, you'll lead SRE practices and cloud application management, mentor teams, and enhance system reliability.
Top Skills: ApmAWSTerraform
11 Days Ago
Hybrid
London, Greater London, England, GBR
Senior level
Senior level
Fintech • Information Technology • Financial Services
Lead the Site Reliability Engineering team by implementing SRE best practices, automating solutions, and improving cloud-native architectures. Collaborate with teams to enhance performance, reliability, and incident management, while driving innovation in CI/CD and observability initiatives.
Top Skills: AksAWSAzureBashDockerEksElkGCPGithub ActionsGrafanaJavaJenkinsKubernetesOpenshiftPrometheusPythonTerraform
14 Days Ago
Hybrid
London, Greater London, England, GBR
Mid level
Mid level
Cloud • Information Technology • Security • Software • Cybersecurity
We are seeking Systems Reliability Engineers (SRE) to ensure operational excellence of our Edge platform, focusing on automation, monitoring, and service performance. Candidates should possess strong Linux, networking, and programming skills primarily in Go or Python, along with 3 years of SRE experience.
Top Skills: ApacheBgpDnsDockerGoGrafanaGraphiteHaproxyHTTPIp AnycastLinuxNginxOpentsdbPrometheusPythonSaltSQLSquidVarnish

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account