Thought Machine Logo

Thought Machine

Senior Site Reliability Engineer

Posted 15 Days Ago
Be an Early Applicant
Hybrid
London, Greater London, England
Senior level
Hybrid
London, Greater London, England
Senior level
Run and maintain production infrastructure for Thought Machine's cloud-native core banking and payments SaaS. Automate fleet management, disaster recovery, backups and capacity planning; participate in design reviews, on-call rotations, and reliability-focused feature design. Mentor team members and maintain operational documentation and runbooks.
The summary above was generated by AI

Thought Machine’s mission is bold – to properly and permanently rid the world’s banks of legacy technology. To achieve this, we have developed the foundations of modern banking through core and payments technology which run natively in the cloud. What we are attempting is hard and means we need great people working together to build great technology.

We have grown rapidly in the past few years – growing our team to more than 550 individuals across offices in London, New York, Singapore and Sydney. We have raised more than $500m in funding and are now valued at $2.7bn. Our investors include Molten Ventures, Eurazeo, Intesa Sanpaolo, Temasek, Nyca Partners, JPMorgan Chase Strategic Investments, Standard Chartered Ventures, and more.

We have created a culture that enables our team to produce the best work in the industry while ensuring we have fun along the way. We're regularly cited as having a fantastic workplace culture and have been recognised by Sifted magazine as having one of the highest Glassdoor ratings for a UK fintech company and the industry's most generous employee share package. Named one of the world’s most innovative fintechs by Global Finance Magazine, we were also recognised by the Financial Times as one of Europe’s fastest-growing companies for two consecutive years—and a UK Best Employer for 2026.

Thought Machine’s Site Reliability Engineers are the guardians of mission-critical systems for the world's most influential financial institutions. As a member of our elite, globally distributed team, you'll be entrusted with running and maintaining the robust production infrastructure that powers our customers' cutting-edge Core Banking and Payments platforms. This is an opportunity to make a tangible impact on the global financial landscape while collaborating with brilliant minds to solve complex engineering challenges.

This role will be part of the Site Reliability Engineering team at Thought Machine HQ in London, tackling the challenges of automating complex fleet management operations, mentoring team members, promoting communities of best practice within engineering as well as designing operational processes that provide effective interfaces between Thought Machine and our SaaS customers.

The SRE team is deeply involved in tackling the technical challenges of executing Thought Machine’s growth ambitions - expect to be working with senior stakeholders in the organisation and with our customers, and working on programmes and initiatives that are critical to the success of the company.

Duties:

  • Supporting the product engineering teams in building highly fault-tolerant, scalable applications by participating in design discussions, engaging in RFCs and code reviews.

  • Executing various department strategies - contributing to the design and scoping work for team members around disaster recovery, backup, redundancy and capacity planning activities.

  • Being part of a global on-call rotation responsible for identifying and fixing bottlenecks in SaaS customer environments.

  • Regular maintenance of production systems that host Vault products.

  • Driving the evolution of our SaaS products by defining and designing features that foster exceptional reliability and an unparalleled user experience.

  • Implementing and regularly testing DR strategies to ensure the highest level of resilience and fault tolerance of the platform.

  • Maintain and promote high-quality written documentation of assets, processes and runbooks that are used by the team in their day-to-day operations,

  • Working with your Manager in growing team members in their technical skills as well as their understanding of Vault Products.

Requirements:

  • You have a track record of delivering high-impact projects with focus on long-term scalability, ensuring that human intervention scales sub-linearly with usage growth.

  • You possess an up-to-date understanding of design patterns relevant to hosting and networking architectures.

  • You proactively champion product development, driven by a desire to build truly exceptional products, not just solve immediate challenges.

  • You’re a high-agency individual who can independently drive projects to completion by effectively scaling your individual output with the appropriate delegation of work to team members.

  • You have a strong background working in either Python, Golang or Java, having used one of these programming languages to execute a significantly sized project or initiative.

  • You have experience working with Kubernetes or other container orchestration systems.

  • You have experience with automation/configuration management, e.g. Terraform, Puppet, Chef, Ansible.

  • You have expertise in one or more of the following areas: Database Administration, Networking, Observability Tools (such as Prometheus, Jaeger) or automation infrastructure.

  • You have extensive experience working with either GCP or AWS.

We actively hire candidates who demonstrate technical excellence in their field and welcome people of all ages and backgrounds, providing everyone with equal access to professional development. You are encouraged to apply even if your experience doesn't accurately match the job description. We also encourage applications from those with different abilities, including candidates with ADHD, autism, dyslexia or dyspraxia.

Top Skills

Python,Golang,Java,Kubernetes,Terraform,Puppet,Chef,Ansible,Prometheus,Jaeger,Gcp,Aws,Vault

Thought Machine London, England Office

7 Herbrand Street, London, United Kingdom, WC1N 1EX

Similar Jobs

17 Days Ago
In-Office or Remote
10 Locations
Senior level
Senior level
Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3
Design, operate, and scale production blockchain node infrastructure across multiple clouds. Build and maintain Kubernetes clusters, IaC with Terraform, CI/CD automation, and integrate AI-assisted tooling. Provide 24/7 on-call incident response, partner with security, mentor engineers, and improve reliability for a fast-growing blockchain platform.
Top Skills: Kubernetes,Helm,Terraform,Go,Python,Shell,Aws,Gcp,Sql,Ci/Cd,Container Image Builds,Blue-Green Deployment,Canary Deployment,Observability,Kubernetes Operators,Kubernetes Controllers,Rbac,Blockchain Nodes (Arc,Ethereum,Solana,Base),Smart Contracts,Cursor,Agentic Workflows
6 Days Ago
In-Office
4 Locations
Senior level
Senior level
Fintech • Software • Financial Services
The role involves improving SRE capabilities, ensuring system reliability and availability, and managing cloud deployments and incident management processes.
Top Skills: Cloud Native SolutionsDatadogDynatraceGCPKubernetesPython
14 Days Ago
In-Office
2 Locations
Mid level
Mid level
Artificial Intelligence • Healthtech
As a Senior Site Reliability Engineer, you will enhance operational reliability, participate in incident response, improve system observability, and automate operational tasks while collaborating with engineers to maintain production systems.
Top Skills: AWSBashDatadogKubernetesPrometheusPythonTerraform

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account