fanvue Logo

fanvue

Site Reliability Engineer

Reposted 24 Days Ago
Be an Early Applicant
Remote
Hiring Remotely in UK
Expert/Leader
Remote
Hiring Remotely in UK
Expert/Leader
The Site Reliability Engineer will enhance platform reliability, scalability, and performance, focusing on AWS infrastructure and Aurora PostgreSQL management.
The summary above was generated by AI
Join us in redefining the creator economy with AI

Fanvue is one of the fastest-growing creator monetisation platforms globally. We’re an AI-powered, creator-first platform helping creators connect, engage, and earn directly from their audiences at scale. Following our recent Series A, Fanvue has surpassed $100M+ in annual recurring revenue with triple-digit year-on-year growth, supporting hundreds of thousands of creators and millions of fans worldwide.

Reliability at Fanvue is a growth enabler. This role exists to ensure our systems are predictable, scalable, and resilient, so product teams can ship fast without compromising uptime, performance, or creator trust.

🎯 The Role

We’re hiring a Site Reliability Engineer to strengthen Fanvue’s platform reliability and infrastructure foundations. You’ll work closely with Platform and Product Engineering teams to design, operate, and evolve the systems that keep Fanvue fast, available, and safe as we scale.

This is a hands-on role focused on infrastructure, observability, automation, and operational excellence, with real ownership of production systems.

🚀 What You’ll Do

  • Design, build, and operate reliable infrastructure across Fanvue’s cloud environment

  • Own and improve observability, monitoring, and alerting for critical services

  • Reduce operational toil through automation, tooling, and infrastructure as code

  • Partner with engineering teams to improve reliability, scalability, and deployment safety

  • Lead incident response for infrastructure issues and drive high-quality post-incident reviews

  • Define and track SLOs, SLIs, and error budgets to balance reliability with delivery speed

  • Improve CI/CD reliability and rollout practices to reduce risk

  • Contribute to disaster recovery, backup, and resilience planning

👀 Who You Are

  • Strong experience as an SRE, infrastructure engineer, or platform engineer

  • Comfortable operating production systems at scale

  • Experience with cloud platforms and distributed systems

  • Strong background in observability, monitoring, and incident management

  • Comfortable writing automation and infrastructure code

  • Calm, clear communicator during incidents and escalations

  • High ownership mindset with a bias toward long-term reliability improvements

You’ll Thrive Here If

  • You care deeply about system reliability and predictability

  • You enjoy preventing problems more than reacting to them

  • You like partnering with product engineers rather than acting as a gatekeeper

  • You’re comfortable owning on-call responsibilities

  • You value learning through incidents and continuous improvement

⚠️ You’ll Struggle Here If

  • You prefer reactive firefighting over proactive reliability work

  • You’re uncomfortable with operational responsibility

  • You avoid incident ownership or post-incident accountability

  • You need heavy process to operate effectively

🌍 Why Join Fanvue?

  • Own reliability for a $100M+ ARR platform

  • Enable teams to ship faster with confidence

  • Work on complex, real-world scaling challenges

  • Competitive salary and benefits package

  • Unlimited holiday

  • Remote working

  • Flexible hours, according to when you perform best

  • Budget for growth and wellbeing

Fanvue is for Everyone

We believe diverse teams build better products. Even if you do not meet every single requirement listed, we encourage you to apply. Many great people grow into parts of a role, and we value potential, mindset, and ambition just as much as experience.

Top Skills

Aurora Postgresql
AWS
Aws Cdk
Aws Cloudwatch
DynamoDB
Elasticache Redis
Rds
Typescript
HQ

fanvue London, England Office

London, United Kingdom

Similar Jobs

5 Days Ago
Remote
United Kingdom
Mid level
Mid level
Social Media
The Site Reliability Engineer will design, build, and maintain AWS cloud infrastructure, ensure performance and reliability, automate tasks, and participate in incident management.
Top Skills: AWSBashPythonTerraform
10 Days Ago
Remote
United Kingdom
Senior level
Senior level
Software
Lead and evolve LiveOps teams for reliable SaaS and cloud environments, focusing on operational excellence, automation, and incident management.
Top Skills: AnsibleAWSAzureDatadogGithub ActionsGrafanaKubernetesPackerPrometheusTerraformVMware
10 Days Ago
In-Office or Remote
3 Locations
Entry level
Entry level
Retail • Energy • Utilities
The Reliability Operations Specialist maintains system stability, monitors performance, manages incidents, collaborates with service owners, and drives automation to enhance operational efficiency.
Top Skills: Automation ToolsChange Management ProcessesIncident Management ProcessesItilMonitoring ToolsOperational Dashboards

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account