HelloKindred Logo

HelloKindred

Platform - SRE Engineer

Posted 3 Days Ago
Be an Early Applicant
Hybrid
Sheffield, South Yorkshire, England
Mid level
Hybrid
Sheffield, South Yorkshire, England
Mid level
The SRE Engineer will manage deployment and reliability for an AI helpdesk platform, focusing on CI/CD pipelines, observability, cloud infrastructure, and support for AI services.
The summary above was generated by AI
Company Description

Who is HelloKindred?

HelloKindred are specialists in staffing marketing, creative and technology roles, offering a range of talent solutions that can be delivered on-site, remotely or hybrid.

Our vision is to make work accessible and people’s lives better. We do this by disrupting traditional employment barriers – connecting ambitious talent to flexible opportunities with trusted brands.

Job Description

Anticipated Contract End Date/Length: November 30, 2026.
Work Set Up: Hybrid (3 days per week in office)
Clearance required: BPSS

Our client in the Information Technology and Services industry is looking for a Platform / SRE Engineer to own deployment, observability, reliability, cost control, and production operations for an AI helpdesk platform. This role will support the design, deployment, and operational management of AI services and production environments while ensuring scalability, uptime, performance optimization, and operational resilience across cloud-based infrastructure.

The ideal candidate will bring strong expertise in DevOps and Site Reliability Engineering practices, along with experience managing cloud-native platforms, CI/CD pipelines, observability tooling, and AI/ML production workloads within complex enterprise environments.

What you will do:

  • Build and manage CI/CD pipelines, infrastructure, and runtime environments for AI services.
  • Deploy and operate model-serving, orchestration, and application workloads.
  • Implement monitoring, tracing, alerting, logging, and operational dashboards.
  • Manage scaling activities, release processes, rollback mechanisms, and production support operations.
  • Optimize inference cost, latency, uptime, and overall system reliability.
  • Create runbooks, operational standards, and incident response processes.
  • Support infrastructure automation and platform engineering initiatives.
  • Maintain observability and monitoring solutions across production environments.
  • Support release automation, secrets management, and production operational processes.
  • Collaborate with engineering teams to support AI platform reliability and operational readiness.
  • Troubleshoot production issues and support system diagnostics and remediation activities.
  • Ensure platform stability, scalability, and performance across cloud-native environments.

Qualifications

  • Strong experience in DevOps and Site Reliability Engineering environments.
  • Experience with Docker, Kubernetes, cloud platforms, and Infrastructure as Code practices.
  • Strong experience with monitoring, observability, and operational tooling.
  • Familiarity with CI/CD pipelines, release automation, secrets management, and production support processes.
  • Understanding of LLM deployment patterns and API-based model integrations.
  • Experience working with cloud platforms, particularly AWS.
  • Experience using Jira, Confluence, and ServiceNow.
  • Experience supporting AI/ML workloads in production environments is preferred.
  • Experience with GPU workloads, autoscaling, and cost optimization is preferred.
  • Strong troubleshooting, operational support, and incident response capabilities.
  • Strong communication and collaboration skills within cross-functional engineering teams.

Additional Information

All your information will be kept confidential according to EEO guidelines.

Candidates must be legally authorized to live and work in the country where the position is based, without requiring employer sponsorship.

HelloKindred is committed to fair, transparent, and inclusive hiring practices. We assess candidates based on skills, experience, and role-related requirements.

We appreciate your interest in this opportunity. While we review every application carefully, only candidates selected for an interview will be contacted.

HelloKindred is an equal opportunity employer. We welcome applicants of all backgrounds and do not discriminate on the basis of race, colour, religion, sex, gender identity or expression, sexual orientation, age, national origin, disability, veteran status, or any other protected characteristic under applicable law.

Similar Jobs

16 Days Ago
In-Office
London, Greater London, England, GBR
Senior level
Senior level
Artificial Intelligence • Transportation
The role involves building and scaling the reliability of AI cloud platforms, overseeing GPU compute infrastructure, incident response, and operational excellence while collaborating with various teams to enhance system efficiency and resilience.
Top Skills: AWSAzureC++DatadogGCPGoGrafanaKubernetesOpentelemetryPrometheusPythonTerraform
16 Days Ago
In-Office
London, Greater London, England, GBR
Senior level
Senior level
Information Technology
The Principal Engineer will focus on scaling AI-driven development, shaping engineering teams' practices, supporting project transitions, and enhancing application deployment processes.
Top Skills: Ai ToolingAWSCi/CdJavaNode.jsPython
21 Days Ago
In-Office
London, Greater London, England, GBR
Expert/Leader
Expert/Leader
Artificial Intelligence • Transportation
As a Staff Cloud Site Reliability Engineer, you will build and scale AI cloud platform reliability, define operational standards, and lead SRE initiatives.
Top Skills: AWSAzureC++DatadogGCPGoGrafanaKubernetesOpentelemetryPrometheusPythonTerraform

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account