NewDay Jobs

Senior Site Reliability Engineer

NewDay

Senior Site Reliability Engineer

Reposted 2 Days Ago

Be an Early Applicant

In-Office

London, Greater London, England, GBR

Senior level

In-Office

London, Greater London, England, GBR

Senior level

Lead reliability initiatives across the platform by automating infrastructure and operational processes, building observability (monitoring, logging, tracing), driving incident management and root cause analysis, and collaborating with engineering teams to embed SRE practices, resilience, and performance into delivery.

The summary above was generated by AI

Mission Statement & Summary

As a Senior Site Reliability Engineer, you'll sit at the intersection of software engineering and operations, driving reliability, performance, automation, and resilience across our technology estate.

This is an opportunity to shape the future of our platform rather than simply maintain it. You'll work alongside talented engineers, influence technical direction, and champion modern reliability practices that enable teams to move faster with confidence. If you're passionate about solving complex problems, eliminating toil through automation, and creating systems that are resilient by design, we'd love to hear from you.

How you'll contribute

Lead initiatives that improve platform reliability, scalability, and operational excellence.
Design and deliver automation solutions that reduce manual effort and accelerate engineering teams.
Develop observability capabilities, enabling proactive monitoring and faster incident resolution.
You will facilitate incident management, driving root cause analysis and continuous improvement.
You'll collaborate with engineering teams to embed reliability, resilience, and performance into every stage of delivery.
You will contribute to a large scale migration to OpenTelemetry

We're looking for these essential skills

Software engineering and design experience (preferably .net/C#), to build and improve production systems, apply solid design principles, and contribute directly to codebases to deliver reliable, scalable, and maintainable services.
The ability to automate infrastructure, operational processes, and deployments using modern engineering practices.
Experience building effective observability solutions, including monitoring, logging, alerting, and tracing.
Strong problem-solving skills with the ability to diagnose and resolve complex production issues.
The ability to influence technical decisions and collaborate effectively across engineering and business teams.
Experience with instrumenting via OpenTelemetry

It's a plus if you also have these skills

Experience operating Kubernetes-based platforms at scale.
Knowledge of Infrastructure as Code tools and cloud platform services.
Experience implementing Site Reliability Engineering principles, including SLOs, SLIs, and error budgets.
Familiarity with security, compliance, and resilience best practices within cloud environments.
Experience mentoring engineers and helping teams adopt modern operational and reliability practices.

At NewDay, we value all types of diversity. We’re an equal opportunity employer and believe that our differences create a vibrant, authentic working culture. We want all our colleagues to feel able to bring their whole selves to work. We don’t discriminate on the basis of protected characteristics or identities. We make sure that every job is crafted to be inclusive and that people with disabilities or caring responsibilities can take part in the application and interview process.

Tell us if you need accommodations: We’ll put reasonable adjustments in place to support you.

We work with Textio to make our job design and hiring inclusive.

PermanentSenior SRE role profile.docx

7 Handyside Street, London, United Kingdom, N1C 4DA

Similar Jobs

Carta

Senior Site Reliability Engineer

11 Days Ago

Hybrid

London, Greater London, England, GBR

Senior level

Fintech • Software

Design, build, and scale internal compute, storage, and networking platform services to ensure reliability and performance. Implement monitoring, alerting, and incident response; collaborate with application engineers to ensure scalable designs; automate infrastructure and improve systems globally while reducing operational toil.

Top Skills: AnsibleAWSAzureCi/CdCloudFormationCniDatadogDockerEc2Elk StackGoogle Cloud PlatformGrafanaGraphQLGrpcJavaKubernetesLambdaPostgresPrometheusPythonRdsRestS3Terraform

Lantern

Senior Site Reliability Engineer

19 Days Ago

Hybrid

Senior level

Healthtech

Lead reliability efforts for Lantern's Azure-based healthcare platform by defining SRE practices, building observability and incident management systems, automating infrastructure with Terraform, ensuring compliance (HIPAA/SOC 2), optimizing performance and costs, supporting CI/CD, designing DR strategies, and mentoring engineers to improve resilience and reduce operational toil.

Top Skills: AzureAzure DevopsAzure Kubernetes ServiceAzure MonitorBashDatadogGithub ActionsGrafanaKubernetesPowershellPrometheusPythonRootlyTerraform

Allwyn UK

Site Reliability Engineer

Yesterday

In-Office

Watford, Hertfordshire, England, GBR

Senior level

Consumer Web • eCommerce • Gaming

Lead SRE responsible for reliability across customer-facing systems using SLOs/SLIs/error budgets. Own incident command and on-call, drive automation (Terraform, CI/CD), transition from ECS to EKS, optimise capacity and performance for high-concurrency events, implement observability (Splunk, CloudWatch, Grafana, Quantum Metric), prioritise SRE backlog, and mentor teams while reporting reliability metrics to leadership.

Top Skills: AWSCi/CdCloudwatchEcsEksGoGrafanaKubernetesPythonQuantum MetricSplunkTerraform

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

NewDay

Senior Site Reliability Engineer

NewDay London, England Office

Similar Jobs

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Site Reliability Engineer

What you need to know about the London Tech Scene