fanvue Logo

fanvue

Site Reliability Engineer

Posted 17 Days Ago
Be an Early Applicant
UK
Expert/Leader
UK
Expert/Leader
The Site Reliability Engineer will enhance platform reliability, scalability, and performance, focusing on AWS infrastructure and Aurora PostgreSQL management.
The summary above was generated by AI
Join us in redefining the creator economy with AI

Fanvue is the fastest-growing creator monetisation platform in the creator economy. We are the leading AI-powered creator-first platform, designed to empower creators worldwide to directly monetise their audience. We’re on a mission to redefine the creator economy by empowering creators to connect, share, and earn more efficiently.

🎯 The Role
We are hiring a Site Reliability Engineer (SRE) to elevate the reliability, scalability, and performance of the core platform that powers Fanvue. You will be the technical specialist who ensures our infrastructure is predictable, resilient, and capable of supporting rapid product development across multiple teams.

This role sits at the heart of the platform: improving the health of our Aurora PostgreSQL estate, developing robust AWS infrastructure, enabling engineering teams with deep technical expertise, and driving the reliability culture required to support a fast-scaling product.

🚀 What You’ll Do

  • Own and optimise Aurora PostgreSQL (ServerlessV2) clusters that power Fanvue’s core systems, ensuring performance, availability, and scalability

  • Oversee the reliability of AWS-managed data infrastructure across Aurora, ElastiCache Redis, DynamoDB, and RDS

  • Develop and maintain Infrastructure as Code using AWS CDK (TypeScript), establishing automated, reusable patterns and best practices

  • Reduce operational toil through automation and build self-service tooling that empowers engineering teams

  • Implement and maintain robust monitoring, observability, and alerting using AWS CloudWatch

  • Ensure CI/CD pipelines are reliable, safe, and performant, enabling frequent and high-confidence deployments

  • Act as the escalation point for complex infrastructure and database issues, supporting teams when deep expertise is required

  • Lead incident response, run post-mortems, and deliver actionable improvements to avoid repeat failures

  • Partner closely with stream teams to understand their infrastructure needs and provide technical guidance without slowing their velocity

  • Mentor engineers across the Platform team, raising reliability standards and improving operational maturity

👀 Who You Are
A highly experienced reliability engineer with deep hands-on expertise in AWS-managed database systems, distributed systems, and infrastructure automation. You bring:

  • Extensive experience operating, scaling, and tuning Aurora PostgreSQL (preferably ServerlessV2)

  • Strong proficiency across AWS database services: Aurora PostgreSQL, ElastiCache Redis, DynamoDB, and RDS

  • Expertise with Infrastructure as Code, especially AWS CDK (TypeScript)

  • Proven ability to identify, measure, and eliminate toil through automation

  • Experience applying SRE principles: SLIs, SLOs, error budgets, gradual rollouts, and reliability-focused system design

  • Strong architectural thinking, with the ability to design fault-tolerant, scalable infrastructure

  • Deep expertise with monitoring, observability, and performance tuning using AWS CloudWatch

  • Excellent communication skills and the ability to guide teams without creating bottlenecks

  • A high-ownership mindset aligned with Amazon Leadership Principles: Ownership, Dive Deep, Think Big, Deliver Results

Nice-to-haves

  • Experience supporting ECS Fargate workloads or containerised environments

  • Background in building internal platform tools or developer enablement systems

  • Familiarity with microservice vs centralised architecture trade-offs

You’ll Thrive Here If

  • You enjoy being the deep technical expert teams rely on

  • You love optimising systems for performance and reliability

  • You are motivated by solving hard technical problems and making infrastructure invisible, stable, and scalable

  • You take pride in raising engineering standards and creating leverage for others

⚠️ You’ll Struggle Here If

  • You prefer reactive operations over proactive engineering

  • You are uncomfortable owning large technical surfaces with autonomy

  • You avoid hands-on investigation, deep dives, or operational responsibility

🌍 Why Join Fanvue?

  • Own and strengthen the most mission-critical systems at one of the fastest-growing creator platforms

  • Competitive salary, equity, and benefits package

  • A culture that values innovation, ownership, transparency, and speed

  • Unlimited holiday

  • Remote working

  • Flexible hours to support how you perform best

  • Budget for growth and wellbeing

Fanvue is for Everyone
We know that diverse teams build better products. Even if you do not meet every single requirement, we encourage you to apply. Many great people grow into parts of a role, and we value potential just as much as experience.

Top Skills

Aurora Postgresql
AWS
Aws Cdk
Aws Cloudwatch
DynamoDB
Elasticache Redis
Rds
Typescript
HQ

fanvue London, England Office

London, United Kingdom

Similar Jobs

8 Days Ago
Hybrid
Bournemouth, Dorset, England, GBR
Senior level
Senior level
Financial Services
As a Lead Site Reliability Engineer, you'll lead SRE practices and cloud application management, mentor teams, and enhance system reliability.
Top Skills: ApmAWSTerraform
11 Days Ago
Hybrid
London, Greater London, England, GBR
Senior level
Senior level
Fintech • Information Technology • Financial Services
Lead the Site Reliability Engineering team by implementing SRE best practices, automating solutions, and improving cloud-native architectures. Collaborate with teams to enhance performance, reliability, and incident management, while driving innovation in CI/CD and observability initiatives.
Top Skills: AksAWSAzureBashDockerEksElkGCPGithub ActionsGrafanaJavaJenkinsKubernetesOpenshiftPrometheusPythonTerraform
14 Days Ago
Hybrid
London, Greater London, England, GBR
Mid level
Mid level
Cloud • Information Technology • Security • Software • Cybersecurity
We are seeking Systems Reliability Engineers (SRE) to ensure operational excellence of our Edge platform, focusing on automation, monitoring, and service performance. Candidates should possess strong Linux, networking, and programming skills primarily in Go or Python, along with 3 years of SRE experience.
Top Skills: ApacheBgpDnsDockerGoGrafanaGraphiteHaproxyHTTPIp AnycastLinuxNginxOpentsdbPrometheusPythonSaltSQLSquidVarnish

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account