TWG Global Logo

TWG Global

Platform / Site Reliability Engineer (UK)

Posted 2 Days Ago
Be an Early Applicant
In-Office
London, Greater London, England, GBR
Mid level
In-Office
London, Greater London, England, GBR
Mid level
Maintain and scale data and ML infrastructure, build CI/CD pipelines, implement observability, ensure high availability and disaster recovery, manage access and secrets, troubleshoot incidents, and provide 24/7 coverage across time zones.
The summary above was generated by AI

At TWG Group Holdings, LLC (“TWG Global”), we drive innovation and business transformation across a range of industries—including financial services, insurance, technology, media, and sports—by leveraging data and AI as core assets. Our AI-first, cloud-native approach delivers real-time intelligence and interactive business applications, empowering informed decision-making for both customers and employees.

We prioritize responsible data and AI practices to ensure ethical standards and regulatory compliance. Our decentralized structure enables each business unit to operate autonomously, supported by a central AI Solutions Group, while strategic partnerships with leading data and AI vendors fuel game-changing efforts in marketing, operations, and product development. 

You will collaborate with management to advance our data and analytics transformation, enhance productivity, and enable agile, data-driven decisions. By leveraging relationships with top tech startups and universities, you will help create competitive advantages and drive enterprise innovation.

At TWG Global, your contributions will support our goal of sustained growth and superior returns, as we deliver rare value and impact across our businesses.  We’re a fast-growing AI/ML team delivering high-impact use case solutions to financial institutions, insurers, and other regulated enterprises. Backed by proven leaders in finance and national security, our team is scaling rapidly to serve clients across North America with robust, secure, and production-grade AI solutions.

Role Overview

We are seeking a Platform / Site Reliability Engineer (SRE) to ensure the scalability, stability, and performance of our data platforms and ML infrastructure. You’ll work closely with data scientists, ML engineers, and platform vendors to deploy and monitor production systems, automate workflows, and reduce operational overhead. 

What you'll do:

  • Build and maintain infrastructure to support real-time and batch ML workloads
  • Implement observability tools (logging, monitoring, alerting) for model performance and system uptime
  • Design and manage CI/CD pipelines applications
  • Ensure high availability, disaster recovery, and rollback capabilities for production environments
  • Manage access controls, secrets, and security policies in collaboration with compliance and IT
  • Troubleshoot incidents, lead postmortems, and drive root-cause resolution
  • Work with U.S. and international teams to provide 24/7 coverage across time zones

Requirements
  • 3–6 years of experience in DevOps, SRE, or backend engineering roles
  • Proficient with tools like Docker, Kubernetes, Terraform, GitLab/GitHub Actions, Airflow
  • Strong scripting in Python or Bash and familiarity with Linux environments
  • Knowledge of observability stacks (e.g., Prometheus, Grafana, ELK, Datadog)
  • Familiarity with cloud platforms (e.g., AWS, GCP, or Azure)
  • Strong documentation, problem-solving, and incident response skills

Preferred Qualifications:

  • Experience supporting ML/AI workflows using Palantir Foundry is a plus (but not required)
  • Exposure to compliance frameworks like SOC 2, ISO 27001, or financial regulations
  • Knowledge of MLOps frameworks (e.g., MLflow, Kubeflow, SageMaker Pipelines)
  • Ability to automate deployments, testing, and monitoring at scale

Benefits
  • Work on real-world AI applications with high-impact clients
  • Collaborate with world-class data scientists, engineers, and product leaders
  • Flat org structure, high trust, high autonomy
  • Competitive salary + performance-based incentives

Position Location 

This is a remote position, but candidates must be currently based in the UK.

Compensation

The target salary for this position is £94,500. A bonus will be included in the compensation package, in addition to the full range of medical, financial, and other benefits.

Similar Jobs

5 Hours Ago
Remote or Hybrid
Senior level
Senior level
Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Lead change management for the S4/o9 transformation across MEU Demand Planning. Partner with senior leaders to design change strategies, assess impacts, deliver training (TNA, curriculum, localization, train-the-trainer), build change capability, manage stakeholder engagement, and track KPIs to drive adoption and measure effectiveness.
Top Skills: Integrated Business Planning (Ibp)O9 PlanningSap S/4Hana
5 Hours Ago
Remote or Hybrid
Senior level
Senior level
Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Lead change management for the S4/o9 transformation across MEU: set change strategy, manage stakeholder engagement with senior leaders, deliver change impact assessments, own end-to-end functional training, build change capability, and measure adoption and KPIs to ensure successful implementation.
Top Skills: Integrated Business Planning (Ibp)O9 PlanningSap S/4Hana
5 Hours Ago
Remote or Hybrid
Uxbridge, Greater London, England, GBR
Senior level
Senior level
Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Lead program-level change strategy, readiness framework, and QA for change deliverables. Standardize key user learning journeys, manage the integrated change plan, oversee risks and issues, direct Functional Change Leads, represent change at leadership forums, and build lasting organizational change capability.
Top Skills: ConfluenceJIRAMicrosoft TeamsMs ProjectO9OracleSalesforceSAPSharepointSmartsheet

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account