Cepheid Logo

Cepheid

Reliability Engineer

Posted 8 Days Ago
Be an Early Applicant
In-Office
Cambridge, Cambridgeshire, England
Mid level
In-Office
Cambridge, Cambridgeshire, England
Mid level
The Reliability Engineer will ensure production systems' stability, performance, and reliability, manage incidents, automate tasks, and improve system resilience.
The summary above was generated by AI

For over 25 years, Abcam has been providing tools the scientific community needs to enable faster breakthroughs in critical areas like cancer, neurological disorders, infectious diseases, and metabolic disorders.

We believe that to continue making progress, we need to work together, each bringing our own unique perspectives to make an impact on the world. This community needs people like you: dedicated, agile and above all audacious so we can truly drive science forward.

Role Summary

We are seeking a highly motivated Reliability Engineer to join our team. As a Reliability Engineer, you will play a crucial role in ensuring the stability, performance, and reliability of our production systems. Your responsibilities will include proactively identifying and resolving technical issues, leading major incident responses, and implementing best practices for system reliability. You will work closely with cross-functional teams to develop and maintain robust monitoring and automation solutions. This position reports directly to the Global Reliability Manager.

In this role, you will have the opportunity to:

•  Shape system reliability at scale by monitoring performance, spotting trends, and preventing issues before they impact users.

•  Take charge during critical moments, leading major incident responses and driving rapid service restoration.

•  Solve complex problems for the long term, collaborating across teams to implement robust, sustainable solutions.

•  Automate and innovate, building tools and processes that streamline operations and reduce manual work.

•  Drive continuous improvement, using data insights and post-incident learnings to make systems more resilient every day.

The essential requirements of the job include:

•  Automation & Scripting: Ability to code repeatable tasks using PowerShell, Bash, or Python, and familiarity with infrastructure-as-code tools such as Terraform and configuration management tools such as Puppet.

•  Cloud & Infrastructure: Strong knowledge of AWS Cloud services, networking, security, and storage solutions both on-premises and on the cloud.

•  Reliability & Scalability: High-level understanding of High Availability, Disaster Recovery, scalability solutions, and web infrastructure troubleshooting using logs.

•  Monitoring & Incident Management: Proficiency with monitoring dashboards (Grafana, Humio, CloudWatch) and incident management tools like ServiceNow and PagerDuty.

•  Database & Pipelines: Good understanding of SQL Server, Oracle, PostgreSQL (including DML), and familiarity with CI/CD pipelines such as GitLab CI.

It would be a plus if you also possess previous experience in:

•  EKS troubleshooting knowledge

•  Application support experience

•  Linux OS trouble shooting experience

•  Oracle Cloud Infrastructure knowledge

Participate in an on-call rotation to provide 24/7 support for critical systems and respond to incidents as needed.

Join our winning team today. Together, we’ll accelerate the real-life impact of tomorrow’s science and technology. We partner with customers across the globe to help them solve their most complex challenges, architecting solutions that bring the power of science to life.

For more information, visit www.danaher.com.

Top Skills

AWS
Bash
Cloudwatch
Gitlab Ci
Grafana
Humio
Oracle
Pagerduty
Postgres
Powershell
Puppet
Python
Servicenow
SQL Server
Terraform

Similar Jobs

3 Days Ago
In-Office
London, England, GBR
Mid level
Mid level
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Responsible for reliability, security and efficiency of cloud environments for Enterprise Imaging. Automate cloud operations, implement reliable infrastructure, support CI/CD and IaC, and provide 24×7 shift-based incident triage and resolution.
Top Skills: Gcp,Aws,Azure,Python,Node.Js,Kubernetes,Terraform,Ci/Cd,Linux,Windows,Jenkins,Git,Ansible
6 Days Ago
In-Office
London, Greater London, England, GBR
Entry level
Entry level
Aerospace • Artificial Intelligence • Hardware • Robotics • Security • Software • Defense
Design, build, and operate secure infrastructure for cloud, on-prem, and distributed tactical edge systems. Support hardware integration labs, vehicle/device payloads, automation, and implement network security controls and hardening to meet regulated environment requirements.
Top Skills: FirewallsKvmLinux (Rhel)Linux (Ubuntu)PkiQemuTlsVlansVMware
8 Days Ago
In-Office
Slough, Berkshire, England, GBR
Mid level
Mid level
Aerospace • Security • Energy • Defense
The Gas Seals Reliability Engineer will support customers on technical aspects of design and testing, conducting analyses, and fostering relationships with commercial teams and stakeholders.
Top Skills: CadFea Analysis Software

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account