Partly Logo

Partly

Site Reliability Engineer, UK

Posted 4 Days Ago
Be an Early Applicant
In-Office
London, Greater London, England
Senior level
In-Office
London, Greater London, England
Senior level
This role involves ensuring system reliability and performance, optimizing costs, collaborating across teams, and maintaining large software systems while leveraging SRE tools and practices.
The summary above was generated by AI

Note: Partly is headquartered in the UK, with a Product and Engineering base in Christchurch, NZ and an early presence in San Francisco, US. This position is in office, based in London.

🚀 Our story

Partly's mission is to connect the world's parts and we're doing that by building the first global platform for replacement parts, starting with auto parts. Our big vision is to accelerate the world towards a sustainable future where waste is eliminated and all replacement parts are universally searchable, accessible and available to all.

Founded by ex-Rocket Lab engineers, we utilise cutting-edge technology to solve challenging but exciting problems that make a huge impact in a $1.9 trillion industry. We've more than tripled our team over the last 12 months and expect to double in size again over the coming 12 months. We're a global team spanning both Europe and Australasia.

We provide a scalable digital infrastructure solution to some of the world's largest businesses and the most exciting startups. Partly's solutions are integrated across hundreds of companies globally, providing the backbone for cataloguing and managing parts online.

Our investors in Blackbird Ventures (Canva, CultureAmp etc.), Square Peg, Octopus Ventures, Hillfarrance, Icehouse, Peter Beck (Rocket Lab), Akshay Kothari (Notion Co-Founder) and Dylan Field (Figma Co-Founder).

We're continuing to build a world-class team and ensuring Partly is a place where people can do the best work of their lives. We're proud of the culture we've built at Partly, and our values are lived throughout every experience.

🖍️ This role

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, distributed systems, ensuring that both internally critical and externally visible services have the reliability, uptime, and performance appropriate to clients' needs while enabling a fast rate of improvement. SREs maintain constant awareness of system capacity and performance, ensuring our networks, platforms, and tools are scalable, secure, and reliable so engineers can focus on delivering impactful software. This senior role demands high autonomy, leadership, and strategic thinking, making it ideal for those excited by the challenge of designing and supporting the infrastructure that connects the worlds parts.

💻 What will you do
  • Reliability Engineering: Ensure the stability, scalability, and security of our cloud infrastructure, Partly & 3rd party applications in our Kubernetes powered clusters. Leverage Infrastructure-as-Code and automation (Terraform for GCP, GitOps with ArgoCD, Custom scripts in Python/Bash, etc.) to deploy and manage workloads and resources in a repeatable, automated way.

  • Cost Optimisation: Monitor and optimise costs across our cloud and on-prem infrastructure, ensuring we get maximum value from our investments. Make recommendations for resource allocation or architecture changes to improve cost-efficiency without sacrificing reliability or performance.

  • Cross-Functional Collaboration: Work closely with developers, data engineers, and leadership to plan infrastructure needs and improvements. Provide tooling, guidance and training to the engineering team on SRE practices, and collaborate during software delivery to ensure smooth integrations from code to production.

  • Software Engineering: Make sure our software meets high production readiness standards. When you see a problem or an opportunity to improve, you drive the solution.

  • Troubleshooting: participate in incidents resolutions, give developers helping hand in debugging applications, networks, databases, compute systems.

Want to learn more about the problems we're solving and the culture we're building at Partly? Hear directly from our team here: https://shorturl.at/iAFUX

🥷 Your skills
  • Software Engineering: You excel at developing and maintaining large, established software systems beyond simple scripts and utilities. You definitely know what makes software maintainable and you are able to write robust code.

  • Firmly grounded computer science fundamentals: Including data structures, concurrency, architecture, APIs, testing, and design patterns.

  • System engineering fundamentals: You most likely know how to deploy and use memory or stack sampling profiler, how to locate excessive lock contention, how to identify network issues, etc.

  • SRE Expertise: Hands-on experience with modern SRE practices and tooling – for example, containerization (Docker/Kubernetes), infrastructure-as-code (Terraform), and GitOps workflows (ArgoCD or equivalent). You have designed, built, and maintained scalable infrastructure and CI/CD systems.

  • Cloud & Systems Knowledge: Deep familiarity with at least one major cloud platform and Linux operating system. You can tune servers, manage databases/storage, and wrangle Kubernetes clusters.

  • Ownership & Leadership: High degree of ownership and bias for action, with a proactive approach to solving problems. You take initiative and don’t wait to be told what to do. You have demonstrated leadership through mentoring junior engineers or leading small teams/projects, even if not formally a manager. We’re seeking a track record of ownership over critical systems and successful delivery of complex projects.

  • Collaboration & Communication: Excellent communication skills (written and verbal) and a collaborative attitude. You can work across teams and departments – from explaining technical issues to non-technical colleagues, to coordinating with engineers on deployments. You value teamwork and knowledge sharing.

  • Adaptability: Willingness to wear multiple hats and adapt to evolving needs. In a fast-growing startup environment, requirements can change – you’re excited by the chance to learn new skills, take on new challenges, and grow with the role.

  • Bonus Points:

    • Experience in a high-growth startup environment, which means you’re used to the pace and ambiguity.

    • Any prior experience maintaining security compliance and certifications in a company is a plus.

    • If you have used specific tools we use (GCP, ArgoCD, GitLab CI, Kafka, etc.), that’s great – if not, you can learn quickly.

    • If you have significant experience running production workloads over Apache Cassandra and / or Postgres database

    • If you developed software in Rust programming language and can mentor other developers on the best practices in Rust.

Please note: if you don't have all the skills/experience listed above but believe you could be outstanding in this role, please still consider applying. Many folks, especially those from underrepresented or marginalised groups, often count themselves out. Please allow us to learn more about you and why you're exceptional!

🪅 Benefits
  • High trust, low process and no bureaucracy. We hire exceptional people whose judgment we trust. This means we proactively remove any process or rules that slow us down (for example, our expense policy is simply the “red face test”).

  • Competitive base salary + equity. We offer competitive salaries and generous equity options for all full-time employees, ensuring everyone shares in the financial upside when we win.

  • Flexible working hours. Choose when to work based on what time you’re most effective (no mandatory or set hours). We combine flexibility with an office-first approach (in cities where we have critical mass, i.e. London, Christchurch, Auckland).

  • Focus Days. Two days per week, with zero meetings, dedicated solely to uninterrupted deep work

  • Take time when you need it. We don’t ask questions or care if people have a negative leave balance. We work extremely hard and trust our team to take the time they need to recharge.

  • Learn from the best. Whether it’s during a ‘Lunch n Learn’ or hearing from a unicorn CEO at a Fireside chat, you’ll have the opportunity to constantly learn from the world’s best.

  • Quarterly season openers across the UK and EU. Connect regularly at the nearest centralised location for a week of collaboration, big-picture planning and team events.

  • Team connection. Monthly team lunches, celebrating our wins, happy hours and more!

  • Parental leave and flexible return to work. Do what works for you. Primary carers can return with 4-day weeks (on 100% pay for the first 12 weeks). Secondary carers get 10 days full pay.

  • Payroll Giving: We encourage generous giving and donate to the high-impact charities you support

  • CycleSaver: UK employees can now save up to 47% on Lime, Forest, Beryl, or Santander cycle subscriptions through CycleSaver, enjoying the health benefits of cycling to work with flexible, hassle-free monthly plans instead of bike ownership.

Top Skills

Apache Cassandra
Argocd
Bash
Docker
GCP
Gitops
Kubernetes
Postgres
Python
Rust
Terraform

Partly London, England Office

London, United Kingdom, SE1 7LY

Similar Jobs

23 Days Ago
In-Office
London, Greater London, England, GBR
Mid level
Mid level
Cloud • Information Technology • Software
The Site Reliability Engineer will maintain cloud infrastructure, manage incidents, ensure seamless operations, and provide technical support for GCP services.
Top Skills: DatadogGoogle Cloud PlatformGrafanaIncident.IoJIRAKubernetesPythonTerraform
23 Days Ago
Easy Apply
Hybrid
London, Greater London, England, GBR
Easy Apply
Mid level
Mid level
Machine Learning • Software • Conversational AI
The Site Reliability Engineer will enhance the reliability of products and systems, manage cloud deployments, automate processes, and improve monitoring and incident response.
Top Skills: Amazon Web Services (Aws)ArgocdBashDatadogDockerGitlabGoogle Cloud Platform (Gcp)HelmKubernetesAzureOpentelemetryPythonTerraform
7 Hours Ago
In-Office or Remote
Manchester, Greater Manchester, England, GBR
Mid level
Mid level
Information Technology • Internet of Things • Machine Learning • Software
The Site Reliability Engineer will ensure the reliability of services, implement infrastructure solutions, automate deployments, and collaborate with teams to enhance operational security.
Top Skills: AnsibleBashDockerGitGrafanaKubernetesLinuxNagiosPrometheusPuppetPythonTerraformUnixVMware

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account