ESL FACEIT GROUP

Site Reliability Engineer - Remote

Posted 9 Days Ago

Be an Early Applicant

Remote

5 Locations

Mid level

Remote

5 Locations

Mid level

As a Site Reliability Engineer at EFG, you will design, analyze, and troubleshoot large-scale distributed systems. Responsibilities include maintaining and improving monitoring tools, optimizing existing systems, automating tasks, and collaborating with software engineering teams to enhance system reliability and uptime.

The summary above was generated by AI

Description

At EFG (ESL FACEIT Group) we create worlds beyond gameplay where players and fans become community. We pride ourselves in having a corporate social responsibility which is that “IT’S NOT GG (Good Game), UNTIL IT’S GG FOR ALL”. We are passionate about the culture we foster that ultimately helps to create and shape the world of esports, gaming tournaments, leagues, events and holistic ecosystems staged for our millions of players, fans and heroes.

The Team:

As a Site Reliability Engineer at EFG, you will be designing, analyzing, and troubleshooting large-scale distributed systems. You will demonstrate a systematic problem-solving approach, and the ability to debug and optimize code and to automate routine tasks. You will ensure that EFG’s services and systems are reliable, that they have uptime appropriate to users' needs and they have a fast rate of improvement.

Apart from monitoring our systems' capacity and performance, you will also focus on optimizing existing systems, on building infrastructure and on eliminating work through automation. You will work collaboratively with the software engineering teams to deploy and operate our systems, and you will help to automate and streamline our operations and processes. Within this role, you will be given real responsibilities, and you have the opportunity to drive change and have a big impact on our products and platform.

What you will do:

Maintaining and improving the monitoring and observability tools (Grafana/Prometheus/Thanos/Jaeger);

Working closely with your team and with other cross-functional teams to help design, maintain and operate systems at scale;
Developing and driving adoption of SRE best practices across the company;
Leading on incident management process and adoption;
Using your troubleshooting skills to help identify and fix operational issues;
Working with Cloud Native technologies such as Kubernetes, Envoy, Istio, Prometheus and Helm;
Working with the “Hashi Stack” (terraform, packer, vault);
Experimenting with and introducing cutting edge technologies.

Requirements

Proven experience as a Site Reliability Engineer, DevXP Engineer or Software Engineer, focusing on building and maintaining scalable infrastructures;

Excellent working knowledge on at least one of the major cloud providers (GCP/AWS/Azure);
You have experience with cluster management systems (Kubernetes);
Knowledge of incident management: ability to investigate, troubleshoot, recover and prevent the recurrence of incidents that interfere with the normal delivery of IT services;
Proficient in Go language and some level of proficiency in at least another language: Java, Python, Rust…;
You have knowledge of GitOps practices;
You have production scale experience with one of the following; MongoDB, Redis, MySQL;
Experience contributing to open source technologies would be an added bonus.

Top Skills

Java

Python

Rust

Similar Jobs

Fivetran

Senior Site Reliability Engineer

Be an Early Applicant

2 Days Ago

Remote

1,200 Employees

Senior level

Apply

1,200 Employees

Senior level

Big Data • Cloud • Software • Database

As a Senior Site Reliability Engineer at Fivetran, you will ensure the reliability and performance of Fivetran's infrastructure, automate deployment pipelines, and respond to incidents effectively. You will work with engineering, product managers, and support teams to enhance service reliability and monitor infrastructure health.

Fivetran

Senior Staff Site Reliability Engineer

Be an Early Applicant

2 Days Ago

Remote

1,200 Employees

Senior level

Apply

1,200 Employees

Senior level

Big Data • Cloud • Software • Database

The Senior Staff Site Reliability Engineer will ensure the reliability and performance of Fivetran’s production infrastructure. Responsibilities include monitoring availability, automating deployment, collaborating with engineering teams, resolving incidents, and enhancing infrastructure security and stability.

GitLab

Intermediate Site Reliability Engineer, FinOps

2 Days Ago

Remote

2,350 Employees

Entry level

Easy Apply

2,350 Employees

Entry level

Cloud • Security • Software • Cybersecurity • Automation

As an Intermediate Site Reliability Engineer in FinOps at GitLab, you'll ensure systems are scalable, reliable, and financially optimized. Your role involves automating cost management, collaborating with finance and engineering teams, and promoting FinOps principles across operations for cost optimization and financial accountability.

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.