Easy Apply
Easy Apply
Manage incidents and problems, ensuring service continuity while applying SRE principles. Lead deep diagnostics, oversee major changes, and enhance application reliability through automation and observability tools. Mentor teams and ensure compliance with operational standards.
Key Responsibilities
Incident & Problem Management
- Lead major incident (MI) bridges and restore service with minimum business impact.
- Handle all L3 escalations, perform deep diagnostics across Java, JVM, middleware, OS, and infra.
- Own technical RCAs, drive long‑term and systemic remediation.
- Identify recurring failure patterns and risks.
Reliability Engineering
- Apply SRE principles: SLIs/SLOs, error budgets, resilience patterns.
- Tune JVM parameters, analyze thread/heap dumps, and improve performance.
- Influence application architecture for fault tolerance, scalability, and recoverability.
- Validate DR readiness, failover behavior, and resilience testing outcomes.
Change, Release & Risk
- Provide technical approval and risk assessment for high-risk changes.
- Enforce operational readiness for new apps and major releases.
- Ensure changes meet audit, compliance, and regulatory expectations.
Automation, Monitoring & Observability
- Build advanced automation using Shell/Python/PowerShell.
- Develop frameworks for health validation, automated recovery, and compliance checks.
- Define observability standards; optimize alerts and improve MTTR.
Leadership & Mentorship
- Mentor L1/L2 teams; review and approve runbooks, SOPs, and KB articles.
- Act as a trusted technical advisor to stakeholders and leadership.
Skills & Qualifications
Technical (Mandatory)
- Strong knowledge of application architecture, distributed systems, and middleware.
- Java expertise: JVM internals, GC, memory management, thread/heap dump analysis, performance tuning.
- .Net -- CLR internals, garbage collection, memory management, thread/dump analysis, and application performance tuning.
- Strong Unix/Linux, networking basics, and advanced scripting (Shell/Python/PowerShell/VBS).
- Advanced SQL and understanding of databases; Autosys (or equivalent scheduler).
- Handson with observability tools: Splunk, AppDynamics/Dynatrace, ELK, Grafana, Prometheus.
Reliability & Operations
- Major incident leadership, deep RCA, change/release readiness, DR & resilience engineering.
- Experience in regulated production environments.
Soft Skills
- Strong technical leadership and decision‑making.
- Clear communication during high‑pressure incidents.
- Ownership mindset and business awareness.
Experience & Education
- 7–12+ years in Application Reliability, Production Support, SRE, or platform operations.
- Bachelor’s degree in Computer Science/Engineering or equivalent.
- ITIL, cloud, or industry certifications (preferred).
- Banking/financial domain experience (preferred).
Working Conditions
- On‑call and after‑hours support as required.
- Fast‑paced environment with multiple priorities.
- Hybrid working model
Top Skills
.Net
Appdynamics
Dynatrace
Elk
Grafana
Java
Linux
Powershell
Prometheus
Python
Shell
Splunk
SQL
Unix
Ensono Spelthorne, England Office
One London Road, , United Kingdom , Spelthorne, United Kingdom, TW18 4EX
Similar Jobs
Information Technology • Sales • Security • Cybersecurity • Automation
The role involves building strategic partnerships with global system integrators, executing joint business plans, and driving revenue growth through collaboration and pipeline management.
Top Skills:
Prm PlatformsSalesforce
Fintech • Payments • Financial Services
The Business Development Executive will engage prospective customers, build relationships, and achieve sales targets in a field-based role.
Big Data • Fintech • Mobile • Payments • Financial Services
Responsible for setting technical strategy, collaborating across teams on backend systems, ensuring operational reliability, fostering team culture, and developing team talent.
Top Skills:
AWSKotlinKubernetesMySQLPythonSpark
What you need to know about the London Tech Scene
London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.


