Factset Logo

Factset

Lead Site Reliability Engineer

Posted 4 Days Ago
Be an Early Applicant
Remote
Hiring Remotely in United Kingdom
Senior level
Remote
Hiring Remotely in United Kingdom
Senior level
As a Lead Site Reliability Engineer, you will ensure the reliability and performance of our software systems by collaborating with teams, designing scalable architectures, developing automation tools, and improving system reliability. Responsibilities include troubleshooting complex issues, conducting performance analysis, participating in incident response, and staying current with industry trends.
The summary above was generated by AI

We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our growing team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our software systems and infrastructure. The ideal candidate possesses a strong background in coding, automation, and system administration, combined with a passion for continuously improving system reliability.

 

Responsibilities:

  • Collaborate with development, operations, and product teams to define, review, and implement reliability standards and best practices.
  • Design, implement, and maintain highly available and scalable architectures for our applications and infrastructure.
  • Develop and enhance automated tools and frameworks to optimize system monitoring, deployment, and recovery.
  • Troubleshoot and resolve complex issues throughout the entire software stack, including networking, databases, and distributed systems.
  • Conduct performance analysis and capacity planning to ensure system scalability and resource optimization.
  • Take a proactive approach to continuously improving reliability.
  • Participate in incident response, root cause analysis, and postmortem activities to identify and rectify system failures.
  • Collaborate with cross-functional teams to implement and improve CI/CD pipelines, ensuring reliable and efficient software releases.
  • Stay up-to-date with emerging technologies and industry trends, actively contributing to ongoing system improvements.
  • Participate in on-call rotation.

 

Requirements:

  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
  • Proven experience deploying and managing large-scale distributed systems successfully.
  • Understanding of SRE concepts (error budgets, SLIs/SLOs, blameless postmortems)
  • Proficiency in programming languages such as Python, C++, or Go
  • Familiarity with monitoring and observability tools.
  • Excellent problem-solving skills and ability to troubleshoot complex issues efficiently.
  • Strong organizational and communication skills, with the ability to collaborate effectively in a cross-functional team environment.

 

Desirable Qualifications:

  • Familiarity with security best practices and experience implementing security measures in a production environment.
  • Experience with modern infrastructure technologies and tools, including cloud platforms (AWS, Azure, GCP), containers (Docker, Kubernetes), and orchestration (Ansible, Chef, Puppet).
  • Solid understanding of networking protocols and technologies (TCP/IP, DNS, load balancing).
  • Demonstrated experience with infrastructure as code (IaC) and automation tools (e.g., Terraform, GitHub Actions).

 

Join our team and contribute to creating and maintaining a highly reliable and performant infrastructure that supports our growing platform. Help shape the future of our systems architecture while working in a collaborative and innovative environment.

Top Skills

C++
Go
Python

Factset London, England Office

One Snowden Street, , London, United Kingdom, EC2A 2DQ,

Similar Jobs

2 Days Ago
Easy Apply
Remote
28 Locations
Easy Apply
Mid level
Mid level
Cloud • Security • Software • Cybersecurity • Automation
As an Intermediate Site Reliability Engineer focused on Environment Automation, you'll automate operations across numerous GitLab environments. Your responsibilities include building deployment packages, managing infrastructure as code, deploying microservices, maintaining observability, and enhancing security measures while collaborating with engineering teams to resolve architectural issues.
Top Skills: GoRuby
2 Days Ago
Easy Apply
Remote
28 Locations
Easy Apply
Mid level
Mid level
Cloud • Security • Software • Cybersecurity • Automation
The Intermediate Site Reliability Engineer will enhance GitLab's delivery platform by automating release processes, improving monitoring, and optimizing deployment strategies. Key tasks include collaborating with Engineering teams, creating new tools, and ensuring timely and efficient software releases.
Top Skills: Kubernetes
22 Days Ago
Easy Apply
Remote
29 Locations
Easy Apply
Entry level
Entry level
Cloud • Security • Software • Cybersecurity • Automation
As an Intermediate Site Reliability Engineer in FinOps at GitLab, you'll ensure systems are scalable, reliable, and financially optimized. Your role involves automating cost management, collaborating with finance and engineering teams, and promoting FinOps principles across operations for cost optimization and financial accountability.
Top Skills: AnsibleAWSGCPTerraform

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account