Arbor Education Logo

Arbor Education

Site Reliability Engineer

Posted Yesterday
Be an Early Applicant
Remote
Hiring Remotely in United Kingdom
Mid level
Remote
Hiring Remotely in United Kingdom
Mid level
The Site Reliability Engineer will enhance platform performance, implement SLOs, improve observability, ensure availability, and support incident responses.
The summary above was generated by AI

Location: Remote

Salary: £60,000 - £70,000


About us

At Arbor, we’re on a mission to transform the way schools work for the better. 
We believe in a future of work in schools where being challenged doesn’t mean being burnt out and overworked. Where data guides progress without overwhelming staff. And where everyone working in a school is reminded why they got into education every day. 
Our MIS and school management tools are already making a difference in over 7,000 schools and trusts. Giving time and power back to staff, turning data into clear, actionable insights, and supporting happier working days. 
At the heart of our brand is a recognition that the challenges schools face today aren’t just about efficiency, outputs and productivity - but about creating happier working lives for the people who drive education everyday: the staff. We want to make schools more joyful places to work, as well as learn. 


About the role

We are looking for an enthusiastic and proactive Site Reliability Engineer to join our SRE team and help us ensure we provide world-class resilience and performance across the platform. The remit and focus of the role is to advise on all aspects of site reliability including availability, scalability, observability and capacity planning. It’s a broad and exciting role, so we’re looking for someone up for a challenge - if you’re an energetic and a collaborative Site Reliability Engineer, this is the role for you.


Core responsibilities

  • Proactively monitor and analyse platform performance.
  • Collaborate with engineering teams to address performance bottlenecks and ensure scalability.
  • Assist engineering teams with implementing and reviewing SLOs
  • Continually improve observability through monitoring and alerting, and dashboards, using tools such as DataDog or Prometheus for example.
  • Work with other teams to ensure it is effective and provides full coverage.
  • Ensure the service is highly available and resilient 
  • Champion best practices in design for high availability
  • Devise runbooks and run game sessions to test our DR plan, H/A and backups
  • Conduct assessments of capacity and plan for scaling to meet current and future business needs.
  • Work closely with the Head of Platform Engineering and Head of SRE to strategize and implement scalable solutions.
  • Work closely with the Platform team, feature teams and, 2nd line support and other stakeholders to ensure a good level of service is provided for our customers and embed SRE practices.
  • Key player in the response and troubleshooting of incidents, ensuring rapid resolution and minimising downtime.
  • Participate in blameless postmortems to identify root cause and corrective actions
  • Develop and maintain playbooks and documentation

RequirementsAbout you
  • Experience in performance monitoring and analysis
  • Capacity planning experience 
  • Scripting and automation skills, with experience in relevant technologies.
  • Experience with Infrastructure as Code, in particular, Terraform
  • Understanding of relational database technologies and their cloud versions (e.g. AWS Aurora)
  • Experience with messaging and distributed asynchronous workloads
  • Experience with nginx or similar technologies
  • Familiarity with SRE processes.
  • Aware of DevOps principles like the 3 ways and 5 ideals.

Bonus Skills

  • Experience with other database technologies and cloud platforms.
  • Past experience with Enterprise solutions running at scale
  • Familiarity with Kanban and Agile development processes
  • Experience with containerisation, for example Docker
  • Familiarity with software best practices such as Refactoring, Clean Code, Domain-Driven Design and Test-Driven Development.

BenefitsWhat we offer

The chance to work alongside a team of hard-working, passionate people in a role where you’ll see the impact of your work everyday. We also offer:

  • A dedicated wellbeing team who champion initiatives such as mindfulness, lunch n learns, manager training, mental health first aid training and much more!
  • 32 days holiday (plus Bank Holidays). This is made up of 25 days annual leave plus 7 extra company wide days given over Easter, Summer & Christmas
  • Life Assurance paid out at 3x annual salary
  • Comprehensive wellness benefit provided by AIG Smart Health, which provides a 24/7 virtual GP service, Mental health support, Counselling, and personalised Health Checks 
  • Private Dental Insurance with Bupa 
  • Salary sacrifice Pension provided by Scottish Widows
  • Enhanced maternity and adoption leave (20 weeks full pay) and paternity (6 weeks full pay) pay
  • 5 free return to work maternity coaching sessions, helping you adapt to this new exciting time of life!
  • Access to services such as Calm and Bippit (financial wellbeing coaching) 
  • All of our roles champion flexible working and we are happy to discuss what this means to you
  • Social committees that plan team, office and company wide events to bring people together and celebrate success
  • Dedicated professional development training budget (CPD courses, upskilling resources, professional memberships etc)
  • Volunteer with a charity of your choice for a day each year
  • Dog friendly offices!

Interview process
  1. Phone screen
  2. 1st stage
  3. 2nd stage

We are committed to a fair and comfortable recruitment process, so if you require any reasonable adjustments during your application or interview process, please reach out to a member of the team at [email protected].

Our commitment is also backed by our partnership with Neurodiversity Consultancy, Lexxic who provide us with training, support and advice. 


Arbor Education is an equal opportunities organisation

Our goal is for Arbor to be a workplace which represents, celebrates and supports people from all backgrounds, and which gives them the tools they need to thrive - whatever their ambitions may be so we support and promote diversity and equality, and actively  encourage applications from people of all backgrounds. 


Refer a friend 

Know someone else who would be good for this role? You can refer a friend, family member or colleague, if they are offered a role with Arbor, we will say thank you with a voucher valued up to £200! Simply email: [email protected] 

Please note: We are unable to provide visa sponsorship at this time.

HQ

Arbor Education London, England Office

195 Wood Lane, London, United Kingdom, W12 7FQ

Similar Jobs

15 Days Ago
Easy Apply
Remote
United Kingdom
Easy Apply
Mid level
Mid level
Cloud • Security • Software • Cybersecurity • Automation
As a Cloud Cost Utilization SRE at GitLab, you'll manage cloud spending, improve tracking and optimization of cloud usage, and collaborate with finance and engineering teams to enhance cost efficiency across AWS and GCP.
Top Skills: AnsibleAWSElkGCPGrafanaLokiMimirPrometheusTempoTerraform
19 Hours Ago
Remote
United Kingdom
Senior level
Senior level
Mobile • Other • Software • Analytics
As a Senior or Staff Software Engineer focused on SRE, you will optimize performance, manage infrastructure with code, drive observability, and collaborate with teams to enhance system reliability and efficiency.
Top Skills: AWSAzureClickhouseEtcdGCPGoKafkaKubernetesPostgresRedisRustScylladbTerraform
4 Days Ago
In-Office or Remote
London, Greater London, England, GBR
Mid level
Mid level
Software • Financial Services
The Site Reliability Engineer will ensure system reliability and performance of a cloud-based trading platform, automate tasks, and improve integration and operations.
Top Skills: AWSAzureBashDatadogDockerElkGoGCPKubernetesLgtmPrometheusPythonTerraform

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account