NICE Jobs

Site Reliability Engineer

NICE

Site Reliability Engineer

Reposted 11 Days Ago

Be an Early Applicant

Remote

Hiring Remotely in United Kingdom

Mid level

Remote

Hiring Remotely in United Kingdom

Mid level

The Site Reliability Engineer manages incident response, enhances service reliability, automates operations, and collaborates across teams to improve system design.

The summary above was generated by AI

At NiCE, we don’t limit our challenges. We challenge our limits. Always. We’re ambitious. We’re game changers. And we play to win. We set the highest standards and execute beyond them. And if you’re like us, we can offer you the ultimate career opportunity that will light a fire within you.

So, what’s the role all about?

The SRE – NOC role sits at the intersection of traditional Network Operations Center (NOC) responsibilities and engineering‑driven reliability practices. This role focuses on 24/7 service reliability, incident response, operational automation, and observability, while actively reducing operational toil through software and automation.

Unlike a traditional NOC analyst, an SRE‑NOC is expected to engineer problems away, not just respond to alerts.

How will you make an impact?

Incident Response & Operations

Act as a primary or escalation responder in a 24x7 on‑call rotation
Lead or support Major Incident (MI) response, including triage, mitigation, and resolution
Coordinate across Engineering, Infrastructure, Security, and Product teams
Execute and improve runbooks, playbooks, and escalation paths
Drive blameless post‑incident reviews (PIRs) and track corrective actions

Monitoring, Alerting & Observability

Own service health monitoring across infrastructure, applications, and dependencies
Design and maintain alerting strategies that align with SLIs/SLOs
Reduce alert fatigue through signal‑to‑noise improvements
Build dashboards using tools such as:

Grafana
Prometheus
Datadog / Splunk / CloudWatch

Reliability Engineering & Automation

Automate repetitive operational tasks to reduce manual toil
Improve mean time to detect (MTTD) and mean time to resolve (MTTR)
Develop scripts and tools (Python, Bash, Go, etc.) to support NOC/SRE workflows
Implement self‑healing and auto‑remediation where possible
Partner with engineering teams to improve system design for reliability

Platform & Infrastructure Support

Support and troubleshoot:

Linux‑based systems
Cloud platforms (AWS, Azure, GCP)
Kubernetes / containerized environments

Assist with capacity planning and availability reviews
Ensure operational readiness for production releases

Have you got what it takes?

Technical

Strong Linux systems administration
Experience with incident management and production support
Familiarity with:

Cloud infrastructure (AWS preferred)
Containers & orchestration (Docker, Kubernetes)
Monitoring/alerting platforms

Scripting or programming experience in Python, Bash, Go, or similar
Understanding of networking fundamentals (DNS, TCP/IP, load balancing)

Operational

Experience working in 24x7 NOC or production operations environments
Ability to handle high‑pressure incidents calmly and effectively
Strong written and verbal communication for incident coordination
Comfort working from runbooks—but improving them when they fall short

Preferred / Differentiators

Experience defining or operating to SLOs / SLIs
Prior migration from traditional NOC → SRE model
Infrastructure as Code experience (Terraform, Ansible, etc.)
Exposure to security, compliance, or regulated environments

Requisition ID: 10579.

Reporting into: Manager, Network Operations.

Role Type: Individual Contributor.

#LI-Remote

About NiCE

NICE Ltd. (NASDAQ: NICE) software products are used by 25,000+ global businesses, including 85 of the Fortune 100 corporations, to deliver extraordinary customer experiences, fight financial crime and ensure public safety. Every day, NiCE software manages more than 120 million customer interactions and monitors 3+ billion financial transactions.

Known as an innovation powerhouse that excels in AI, cloud and digital, NiCE is consistently recognized as the market leader in its domains, with over 8,500 employees across 30+ countries.

NiCE is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, age, sex, marital status, ancestry, neurotype, physical or mental disability, veteran status, gender identity, sexual orientation or any other category protected by law.

Bracknell, United Kingdom

London, United Kingdom

Similar Jobs

Cisco ThousandEyes

Site Reliability Engineer

15 Hours Ago

Remote or Hybrid

London, Greater London, England, GBR

Mid level

Cloud • Software

Design, operate, and scale large distributed systems for telemetry processing. Build automation, use AI tooling to reduce toil, ensure availability and disaster recovery, participate in on-call incident response, troubleshoot production AWS/Kubernetes environments, and collaborate with application teams to meet SLOs/SLAs.

Top Skills: Ai ToolingAWSGnu/LinuxGoKubernetesPythonTerraform

GitLab

Site Reliability Engineer

18 Days Ago

Easy Apply

Remote

United Kingdom

Easy Apply

Senior level

Cloud • Security • Software • Cybersecurity • Automation

Maintain and improve reliability, scalability, and automation for user-facing production systems. Build infrastructure tooling, operate Kubernetes-based services, write IaC, participate in on-call and incident response, and advance observability and runbooks to reduce toil and improve platform reliability.

Top Skills: AWSCi/CdGCPGitopsGoInfrastructure As Code (Iac)KubernetesKubernetes Operators/ControllersLoggingMetricsRubySlos/SlisTerraform

Patsnap

Site Reliability Engineer

Yesterday

Remote

Senior level

Artificial Intelligence • Software

Lead and grow the UK SRE team to ensure availability, security, performance and scalability of a global SaaS platform. Define operational strategy, set SLIs/SLOs, run major incident response, drive automation and AI-powered operations, improve platform architecture and resilience, and collaborate with Engineering, Product, Infrastructure and Security across regions.

Top Skills: AWSChatgptCi/CdClaudeCodexDockerGithub CopilotInfrastructure As CodeKubernetesObservability Platforms

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

NICE

Site Reliability Engineer

NICE Bracknell, England Office

NICE London, England Office

Similar Jobs

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

What you need to know about the London Tech Scene