xAI

Site Reliability Engineer (SRE)

Posted 17 Days Ago

3 Locations

Mid level

3 Locations

Mid level

The role involves enhancing observability, creating dashboards, developing alerts, managing on-call rotations, and refining the deployment process for reliability in a dynamic environment.

The summary above was generated by AI

About xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.

Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity.

We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important.

All engineers and researchers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

About the Role

We’re looking for an experienced site reliability engineer (SRE) who can thrive in a dynamic start-up environment. The main responsibilities for this role are:

Improving our observability by adding/adjusting metrics
Building easily parsable dashboards
Building reliable alerts
Designing and overseeing our on-call rotations
Improving our deployment process to increase reliability.

An ideal candidate meets at least the following requirements:

Expert in at least one programming language that compiles to machine code such as Rust, C++, or Go. Rust or C++ experience is preferred
Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty
Expert knowledge of deployment technologies such as Pulumi or Terraform
Expert knowledge of Kubernetes.

Location

The role is based in our London office close to Piccadilly Circus underground station. We usually work from the office 5 days a week but allow for work-from-home days when required. Candidates must be willing to attend late meetings at least twice a week to coordinate with the rest of our team, which is based in California. This role includes semi-regular business trips to California.

Interview process

After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 15 minute interview (“phone interview”) during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four technical interviews:

Coding interview in Rust, C++ or Go,
Monitoring & deployment design interview,
Distributed systems design interview,
Meet the wider team and give a 20 minute presentation about the most difficult technical problems you have solved.

Our goal is to finish the process within one week. All interviews will be conducted via Google Meet.

Benefits

Competitive cash-based compensation
xAI equity
Private health and dental insurance
Unlimited time off subject to prior approval

xAI is an equal opportunity employer and does not unlawfully discriminate based on race, color, religion, ethnicity, ancestry, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, age, disability, medical conditions, genetic information, marital status, military or veteran status, or any other applicable legally protected characteristics.

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all applicable federal, state, and local laws, including the San Francisco Fair Chance Ordinance, Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act.

For Los Angeles County (unincorporated) Candidates:

xAI reasonably believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of a conditional offer of employment:

Access to information technology systems and confidential information, including proprietary and trade secret information, and/or user data;
Interacting with internal and/or external clients and colleagues; and
Exercising sound judgment.

California Consumer Privacy Act (CCPA) Notice

Top Skills

C++

Rust

Similar Jobs

BlackLine

Senior Site Reliability Engineer

5 Days Ago

Hybrid

Senior level

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI

As a Senior Site Reliability Engineer at BlackLine, you will develop cloud-based data platforms and services, build and maintain Spark-based data pipelines, and create data service REST API endpoints. Collaborating with cross-functional teams, you'll focus on data governance, troubleshooting, and implementing best practices for data management.

Top Skills: PysparkPythonSQL

Atlassian

Site Reliability Engineer

6 Days Ago

Remote

San Francisco, CA, USA

Junior

Cloud • Information Technology • Productivity • Security • Software • App development • Automation

The Site Reliability Engineer will enhance cloud services by overseeing caching infrastructure and automation, ensuring high availability and performance. The role involves monitoring, debugging, and improving code while scaling distributed software in production environments. Responsibilities include communication across technical levels and implementing best practices in service reliability.

Top Skills: GoJavaPython

Cisco ThousandEyes

Senior Site Reliability Engineer I (Python/Golang), Agent Ops

7 Days Ago

Easy Apply

Hybrid

San Francisco, CA, USA

Easy Apply

Senior level

Cloud • Software

As a Senior Site Reliability Engineer, you'll ensure the reliability of our global monitoring infrastructure and optimize user experiences. Responsibilities include deploying custom infrastructure, collaborating with software engineers, automating processes, and participating in incident response.

Top Skills: GoPython

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.