OnBuy

Senior Site Reliability Engineer

Posted 21 Days Ago

Be an Early Applicant

United Kingdom

Senior level

United Kingdom

Senior level

As a Senior Site Reliability Engineer, you will ensure system reliability by designing scalable systems, developing automated monitoring solutions, collaborating with software developers on reliability issues, and implementing infrastructure as code. You will respond to incidents, enhance operational processes, and maintain system performance documentation, contributing to a culture of shared responsibility for service availability.

The summary above was generated by AI

Description

Who are OnBuy?

OnBuy are an online marketplace who are on a mission of being the best choice for every customer, everywhere.

We have recently been named one of the UK's fastest-growing tech companies in Deloitte's Technology Fast 50 for the third year in a row (as well as 'Fastest-Growing Tech Business in the South West').

All achievements we are very proud of, but we don't let that go to our head. We are all laser focused on our mission and understand the huge joint effort ahead of us needed to succeed.

Working at OnBuy:

We are a team of driven and motivated people who thrive when working at pace. To succeed at OnBuy you need to take charge and fully own your responsibilities, rolling your sleeves up when needed to 'get it done'. Working at OnBuy you are surrounded by so much opportunity, but you must possess the ability to stay focused and prioritise ruthlessly. Most importantly, you will thrive in an ever-changing environment as we are constantly evolving.

At OnBuy, you're not just a number or another cog in a machine. We are creating something really special, and you have the opportunity to affect meaningful change and have your voice heard.

We are a close team, who have the opportunity to learn and grow as OnBuy evolves. We work in a flexible way, meaning we can prioritise our health and relationships, but when we are working, we graft.

Job overview:
As a Senior Site Reliability Engineer, you will play a critical role in ensuring our systems and environments are robust and reliable, with a high degree of observability, monitoring and alerting.

You will help OnBuy build and maintain scalable, reliable systems while ensuring that our services meet the high standards of availability and performance expected by our users. Your expertise will be invaluable in automating and enhancing our operational processes, monitoring application performance, and troubleshooting complex issues.

You will collaborate closely with software engineers to design reliable and efficient systems, participating in reliability reviews and driving best practices. Additionally, you will be responsible for creating and managing infrastructure as code, leveraging modern cloud technologies and tools. Your efforts will not only improve the uptime and efficiency of our services but also foster a culture of shared responsibility for system reliability.

Key Responsibilities:

Design and implement scalable systems to ensure high availability and performance.
Develop automated solutions for monitoring, scaling, and system health management.
Collaborate with software development teams to identify and resolve reliability issues.
Create and maintain documentation related to system architecture, processes, and configurations.
Perform incident response and postmortem analysis to improve site reliability and performance.
Monitor system performance and make necessary adjustments to ensure optimal functionality.
Implement and manage infrastructure as code using tools like Terraform or Ansible.
This role requires out-of-hours support (via a rota) to address urgent DevOps issues, ensuring the reliability and availability of critical systems. Payment for this support is made via the companies ‘out of hours working’ policy’

Requirements

Essential

Proven experience as a Senior Site Reliability Engineer or in a similar role
Strong proficiency in programming languages such as Python, Go, or Java.
Experience with cloud service providers (AWS, Azure, Google Cloud) and container orchestration tools (Kubernetes, Docker).
Solid understanding of networking, distributed systems, and microservices architecture.
Familiarity with monitoring and logging tools (New Relic, Prometheus, Grafana, ELK stack, GCP logging).
Excellent problem-solving skills and the ability to work effectively in a team.
A strong determination and work ethic to find the best solution to any problem.
Excellent problem-solving skills and the ability to work in a fast-paced, collaborative environment.
Strong communication and interpersonal skills, with the ability to effectively collaborate with cross-functional teams.

Benefits

The salary range on offer for this role is £65,000 - £80,000 per annum, depending on experience.

In return for helping us to grow, we’ll offer you company equity, meaning you own a piece of this business we are all working so hard to build.

Top Skills

Python

Similar Jobs

iManage

Senior Site Reliability Engineer

Be an Early Applicant

4 Days Ago

Belfast, County Antrim, Northern Ireland, GBR

Hybrid

980 Employees

Senior level

Apply

980 Employees

Senior level

Artificial Intelligence • Cloud • Information Technology • Legal Tech • Productivity • Software

The Senior Site Reliability Engineer will build and manage iManage's cloud platform, participating in architectural discussions, driving innovation, and scaling cloud infrastructure. Responsibilities include automation, monitoring, incident management, and cross-functional collaboration with teams to ensure platform reliability and security.

Cisco ThousandEyes

Senior Site Reliability Engineer, Observability

Be an Early Applicant

6 Days Ago

London, Greater London, England, GBR

1,100 Employees

Senior level

Easy Apply

1,100 Employees

Senior level

Cloud • Software

The Senior Site Reliability Engineer focuses on enhancing the observability of the ThousandEyes platform by implementing cloud-native monitoring tools, maintaining an alerting pipeline, and contributing to a robust incident response system. They are responsible for designing, deploying, and maintaining monitoring services that ensure proactive detection of issues across cloud environments.

Cisco Meraki

Senior Site Reliability Engineer, Scalability

Be an Early Applicant

6 Days Ago

United Kingdom

Remote

3,000 Employees

Senior level

Easy Apply

3,000 Employees

Senior level

Hardware • Information Technology • Security • Software • Cybersecurity • Conversational AI

As a Senior Site Reliability Engineer, you will enhance the reliability and availability of production environments, collaborate on server expansions, implement workflows, debug applications, mentor engineers, and participate in an on-call rotation, while utilizing coding skills primarily in Ruby and Infrastructure as Code tools.

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.