Factset

Lead Site Reliability Engineer

Posted 18 Days Ago

Be an Early Applicant

Remote

Hiring Remotely in United Kingdom

Senior level

Remote

Hiring Remotely in United Kingdom

Senior level

Lead the design and maintenance of reliable systems, automate processes, troubleshoot issues, and collaborate with various teams to enhance system performance and scalability.

The summary above was generated by AI

We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our growing team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our software systems and infrastructure. The ideal candidate possesses a strong background in coding, automation, and system administration, combined with a passion for continuously improving system reliability.

Responsibilities:

Collaborate with development, operations, and product teams to define, review, and implement reliability standards and best practices.
Design, implement, and maintain highly available and scalable architectures for our applications and infrastructure.
Develop and enhance automated tools and frameworks to optimize system monitoring, deployment, and recovery.
Troubleshoot and resolve complex issues throughout the entire software stack, including networking, databases, and distributed systems.
Conduct performance analysis and capacity planning to ensure system scalability and resource optimization.
Take a proactive approach to continuously improving reliability.
Participate in incident response, root cause analysis, and postmortem activities to identify and rectify system failures.
Collaborate with cross-functional teams to implement and improve CI/CD pipelines, ensuring reliable and efficient software releases.
Stay up-to-date with emerging technologies and industry trends, actively contributing to ongoing system improvements.
Participate in on-call rotation.

Requirements:

Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
Proven experience deploying and managing large-scale distributed systems successfully.
Understanding of SRE concepts (error budgets, SLIs/SLOs, blameless postmortems)
Proficiency in programming languages such as Python, C++, or Go
Familiarity with monitoring and observability tools.
Excellent problem-solving skills and ability to troubleshoot complex issues efficiently.
Strong organizational and communication skills, with the ability to collaborate effectively in a cross-functional team environment.

Desirable Qualifications:

Familiarity with security best practices and experience implementing security measures in a production environment.
Experience with modern infrastructure technologies and tools, including cloud platforms (AWS, Azure, GCP), containers (Docker, Kubernetes), and orchestration (Ansible, Chef, Puppet).
Solid understanding of networking protocols and technologies (TCP/IP, DNS, load balancing).
Demonstrated experience with infrastructure as code (IaC) and automation tools (e.g., Terraform, GitHub Actions).

Join our team and contribute to creating and maintaining a highly reliable and performant infrastructure that supports our growing platform. Help shape the future of our systems architecture while working in a collaborative and innovative environment.

Top Skills

Ansible

AWS

Azure

C++

Chef

Docker

GCP

Github Actions

Kubernetes

Puppet

Python

Terraform

One Snowden Street, , London, United Kingdom, EC2A 2DQ,

Similar Jobs

Lawhive

Lead Infrastructure and Site Reliability Engineer

10 Hours Ago

Remote

London, Greater London, England, GBR

Senior level

Legal Tech

You will lead infrastructure improvements, ensuring scalability, observability, and resilience. Responsibilities include enhancing CI/CD pipelines, monitoring systems, and ensuring security compliance. Collaborate cross-functionally with teams to implement effective DevSecOps strategies, while influencing architecture and engineering culture in a fast-growing startup.

GitLab

Engineering Manager, Growth: Personalization Platform

2 Hours Ago

Easy Apply

Remote

Easy Apply

Senior level

Cloud • Security • Software • Cybersecurity • Automation

Lead the Personalization Platform team as an Engineering Manager, focusing on A/B testing, analytics, and personalized experiences. Manage team dynamics, drive technical strategy, and enhance project delivery while mentoring engineers to drive their growth.

Top Skills: A/B TestingAnalytics SystemsData PipelinesEvent TrackingFeature Flagging SystemsGitGitlabGoRuby

GitLab

Principal Software Engineer, Group Tenant Scale

20 Hours Ago

Easy Apply

Remote

Easy Apply

Senior level

Cloud • Security • Software • Cybersecurity • Automation

This role involves leading the design and evolution of GitLab’s multi-tenant platform, ensuring high availability and performance while mentoring team members. Responsibilities include backend API design and fostering a collaborative engineering culture.

Top Skills: Cloud ComputingGoRuby

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

By clicking Apply you agree to share your profile information with the hiring company.

Factset

Lead Site Reliability Engineer

Top Skills

Factset London, England Office

Similar Jobs

Lead Infrastructure and Site Reliability Engineer

Engineering Manager, Growth: Personalization Platform

Principal Software Engineer, Group Tenant Scale

What you need to know about the London Tech Scene