Partly

Site Reliability Engineer, UK

Reposted 20 Days Ago

Be an Early Applicant

In-Office

London, Greater London, England

Senior level

In-Office

London, Greater London, England

Senior level

This role involves ensuring system reliability and performance, optimizing costs, collaborating across teams, and maintaining large software systems while leveraging SRE tools and practices.

The summary above was generated by AI

Note: Partly is headquartered in the UK, with a Product and Engineering base in Christchurch, NZ and an early presence in San Francisco, US. This position is in office, based in London.

🚀 Our story

Partly's mission is to connect the world's parts and we're doing that by building the first global platform for replacement parts, starting with auto parts. Our big vision is to accelerate the world toward a sustainable future where anyone can fix anything.

Founded by ex-Rocket Lab engineers, we utilise cutting-edge technology to solve challenging but exciting problems that make a huge impact in a $1.9 trillion industry. We've more than tripled our team over the last 12 months and expect to double in size again over the coming 12 months. We're a global team spanning both Europe and Australasia.

We provide a scalable digital infrastructure solution to some of the world's largest businesses and the most exciting startups. Partly's solutions are integrated across hundreds of companies globally, providing the backbone for cataloguing and managing parts online.

Our investors in Blackbird Ventures (Canva, CultureAmp etc.), Square Peg, Octopus Ventures, Hillfarrance, Icehouse, Peter Beck (Rocket Lab), Akshay Kothari (Notion Co-Founder) and Dylan Field (Figma Co-Founder).

We're continuing to build a world-class team and ensuring Partly is a place where people can do the best work of their lives. We're proud of the culture we've built at Partly, and our values are lived throughout every experience.

🖍️ This role

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, distributed systems, ensuring that both internally critical and externally visible services have the reliability, uptime, and performance appropriate to clients' needs while enabling a fast rate of improvement. SREs maintain constant awareness of system capacity and performance, ensuring our networks, platforms, and tools are scalable, secure, and reliable so engineers can focus on delivering impactful software. This senior role demands high autonomy, leadership, and strategic thinking, making it ideal for those excited by the challenge of designing and supporting the infrastructure that connects the worlds parts.

💻 What will you do

Reliability Engineering: Ensure the stability, scalability, and security of our cloud infrastructure, Partly & 3rd party applications in our Kubernetes powered clusters. Leverage Infrastructure-as-Code and automation (Terraform for GCP, GitOps with ArgoCD, Custom scripts in Python/Bash, etc.) to deploy and manage workloads and resources in a repeatable, automated way.
Cost Optimisation: Monitor and optimise costs across our cloud and on-prem infrastructure, ensuring we get maximum value from our investments. Make recommendations for resource allocation or architecture changes to improve cost-efficiency without sacrificing reliability or performance.
Cross-Functional Collaboration: Work closely with developers, data engineers, and leadership to plan infrastructure needs and improvements. Provide tooling, guidance and training to the engineering team on SRE practices, and collaborate during software delivery to ensure smooth integrations from code to production.
Software Engineering: Make sure our software meets high production readiness standards. When you see a problem or an opportunity to improve, you drive the solution.
Troubleshooting: participate in incidents resolutions, give developers helping hand in debugging applications, networks, databases, compute systems.

Want to learn more about the problems we're solving and the culture we're building at Partly? Hear directly from our team here: https://shorturl.at/DPDdl

🥷 Your skills

Software Engineering: You excel at developing and maintaining large, established software systems beyond simple scripts and utilities. You definitely know what makes software maintainable and you are able to write robust code.
Firmly grounded computer science fundamentals: Including data structures, concurrency, architecture, APIs, testing, and design patterns.
System engineering fundamentals: You most likely know how to deploy and use memory or stack sampling profiler, how to locate excessive lock contention, how to identify network issues, etc.
SRE Expertise: Hands-on experience with modern SRE practices and tooling – for example, containerization (Docker/Kubernetes), infrastructure-as-code (Terraform), and GitOps workflows (ArgoCD or equivalent). You have designed, built, and maintained scalable infrastructure and CI/CD systems.
Cloud & Systems Knowledge: Deep familiarity with at least one major cloud platform and Linux operating system. You can tune servers, manage databases/storage, and wrangle Kubernetes clusters.
Ownership & Leadership: High degree of ownership and bias for action, with a proactive approach to solving problems. You take initiative and don’t wait to be told what to do. You have demonstrated leadership through mentoring junior engineers or leading small teams/projects, even if not formally a manager. We’re seeking a track record of ownership over critical systems and successful delivery of complex projects.
Collaboration & Communication: Excellent communication skills (written and verbal) and a collaborative attitude. You can work across teams and departments – from explaining technical issues to non-technical colleagues, to coordinating with engineers on deployments. You value teamwork and knowledge sharing.
Adaptability: Willingness to wear multiple hats and adapt to evolving needs. In a fast-growing startup environment, requirements can change – you’re excited by the chance to learn new skills, take on new challenges, and grow with the role.
Bonus Points:
- Experience in a high-growth startup environment, which means you’re used to the pace and ambiguity.
- Any prior experience maintaining security compliance and certifications in a company is a plus.
- If you have used specific tools we use (GCP, ArgoCD, GitLab CI, Kafka, etc.), that’s great – if not, you can learn quickly.
- If you have significant experience running production workloads over Apache Cassandra and / or Postgres database
- If you developed software in Rust programming language and can mentor other developers on the best practices in Rust.

Please note: if you don't have all the skills/experience listed above but believe you could be outstanding in this role, please still consider applying. Many folks, especially those from underrepresented or marginalised groups, often count themselves out. Please allow us to learn more about you and why you're exceptional!

🪅 Benefits

High trust, low process and no bureaucracy. We hire exceptional people whose judgment we trust. This means we proactively remove any process or rules that slow us down (for example, our expense policy is simply the “red face test”).
Competitive base salary + equity. We offer competitive salaries and generous equity options for all full-time employees, ensuring everyone shares in the financial upside when we win.
Flexible working hours. Choose when to work based on what time you’re most effective (no mandatory or set hours). We combine flexibility with an office-first approach (in cities where we have critical mass, i.e. London, Christchurch, Auckland).
Focus Days. Two days per week, with zero meetings, dedicated solely to uninterrupted deep work
Take time when you need it. We don’t ask questions or care if people have a negative leave balance. We work extremely hard and trust our team to take the time they need to recharge.
Learn from the best. Whether it’s during a ‘Lunch n Learn’ or hearing from a unicorn CEO at a Fireside chat, you’ll have the opportunity to constantly learn from the world’s best.
Quarterly season openers across the UK and EU. Connect regularly at the nearest centralised location for a week of collaboration, big-picture planning and team events.
Team connection. Monthly team lunches, celebrating our wins, happy hours and more!
Parental leave and flexible return to work. Do what works for you. Primary carers can return with 4-day weeks (on 100% pay for the first 12 weeks). Secondary carers get 10 days full pay.
Payroll Giving: We encourage generous giving and donate to the high-impact charities you support
CycleSaver: UK employees can now save up to 47% on Lime, Forest, Beryl, or Santander cycle subscriptions through CycleSaver, enjoying the health benefits of cycling to work with flexible, hassle-free monthly plans instead of bike ownership.

Top Skills

Apache Cassandra

Argocd

Bash

Docker

GCP

Gitops

Kubernetes

Postgres

Python

Rust

Terraform

London, United Kingdom, SE1 7LY

Similar Jobs

MarketAxess

Site Reliability Engineer

21 Days Ago

Hybrid

London, Greater London, England, GBR

Senior level

Fintech • Information Technology • Financial Services

The Lead Site Reliability Engineer will drive SRE best practices, build cloud-native architectures, automate processes, and enhance system reliability and performance.

Top Skills: AWSAzureBashCi/CdDockerElkGCPGithub ActionsGrafanaJavaJenkinsKubernetesPrometheusPythonTerraform

Optum

Site Reliability Engineer

9 Days Ago

In-Office

London, England, GBR

Senior level

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

The Site Reliability Engineer will ensure reliable, secure, and efficient cloud environments, lead incident management, and drive automation of operations and deployment processes.

Top Skills: AnsibleAWSAzureCi/CdGCPGitJenkinsKubernetesLinuxNode.jsPythonTerraformWindows

Citadel

Site Reliability Engineer

11 Days Ago

In-Office

London, Greater London, England, GBR

Entry level

Information Technology • Software • Financial Services • Big Data Analytics

The Site Reliability Engineer at Citadel is responsible for ensuring application reliability and performance, automating tasks, and collaborating with investment teams.

Top Skills: Ci/CdCloud Application ArchitectureCSSJavaScriptPythonReactSoftware Development ToolsSQLUnix Internals

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Partly

Site Reliability Engineer, UK

Top Skills

Partly London, England Office

Similar Jobs

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

What you need to know about the London Tech Scene