Cority

Sr. Site Reliability Engineer

Posted 19 Hours Ago

Be an Early Applicant

Remote

Hiring Remotely in United Kingdom

Senior level

Remote

Hiring Remotely in United Kingdom

Senior level

The Senior Site Reliability Engineer will ensure the availability and efficiency of systems hosted in high-security data centers and Azure, focusing on cloud-native strategies. Responsibilities include production system monitoring, incident response, automation, and collaboration across teams. A proactive approach to learning and improving systems is essential.

The summary above was generated by AI

Meddbase, A Cority Company is UK’s leading online EHR SaaS solution, designed for healthcare professionals seeking a secure practice management and patient record system. It includes a full suite of healthcare management software features covering consultations, cross-organization scheduling, patient analysis, document management, electronic referrals, pathology and reporting.

The system also offers a user-friendly patient portal, automated email and SMS appointment reminders, and a built-in telemedicine platform with mobile app.

Position Summary:

We host our systems in high-security data centres and Azure. You will be working alongside the Infrastructure and Development teams to keep these environments running and assist in expanding our cloud estate.

Our technical strategy is to become entirely cloud-native with a focus on Azure over the next 3 years, and a significant portion of this role will involve supporting and leading on elements of this modernization process.

We are seeking someone passionate about availability, latency, performance, and efficiency. Someone that can champion building observable systems, communicate the need for infrastructure, architecture, or code changes, and who will own both capacity planning and emergency response from a technical perspective.

As we have numerous technologies throughout our system, we require someone who is undaunted by learning as they go and is proactive and happy to take ownership of projects. Meddbase has a very collaborative team, so there will always be support and insight from other developers.

SQL Server is our primary database solution, but we also use Postgres. Much of our system relies on Redis, and knowledge of this would stand a candidate in good stead.

Prometheus and Grafana are used for monitoring our infrastructure and applications, we use Graylog for logging. We also have custom Prometheus exporters written in Go. Knowledge of these tools is beneficial as we expand in this area.

We are making a big push to automate the boring stuff and keep manual fixes to a minimum. Ansible has been chosen to handle this, with the goal of making changes by pull request. This project is still in its early phase, so this is an ideal opportunity to gain practical experience in this area. Prior Infrastructure-as-Code experience would be helpful, but we are able to provide training and knowledge of tools like Packer and Terraform is desirable.

Incident response and occasional firefighting is part of the role, our goal is to minimise this. We take a blameless approach to post-mortems, seeking to find tangible ways to continuously improve. A cool head and prior experience of troubleshooting awkward issues for both internal and external stakeholders are important for this position.

We are looking for enthusiastic people that will contribute to improving and growing our systems and processes. Our teams are predominantly remote so written and spoken English must be very good as we value communication skills highly and we expect people to proactively document their areas of knowledge.

We work with very sensitive data and there will be requirements to undergo security training and vetting processes prior to being able to work on some projects.

We offer an attractive salary and benefits package against a relaxed working environment backdrop. We work on proper technological solutions impacting patients and clinicians worldwide.

Responsibilities:

Production system monitoring and alerting.
Automation of system provisioning and deployments.
Incident response and troubleshooting (willingness to be on a PagerDuty rota).
Managing post-mortems, documenting run books and proposing improvements.
Design and development of features and tools to support the reliability of the system.
Process refinement, documentation, and communication with stakeholders.

Skills & Requirements:

Cloud architecture knowledge, particularly with Azure.
Distributed application analysis, troubleshooting, and development experience.
Experience designing and building monitoring systems (Prometheus, Graylog, Grafana etc.).
Experience troubleshooting Windows (including IIS) and Linux systems.
Knowledge of IaC tools (Ansible, Terraform etc.).
Strong scripting skills (PowerShell, Bash, Python).
Practical incident response experience.
SQL Server / Postgres / Redis.
Strong documentation and communication skills.
Any experience with .Net / C# is advantageous.

About Us

Medical Management Systems is an ISO 27001:2013 certified organization that adheres to NHS Data Security and Protection Toolkit (DSP Toolkit) standards and GDPR compliance. As part of your role, you will be required to complete annual data security awareness training and follow company policies on secure information handling.

We are an equal opportunities employer. All applicants will be considered for employment without attention to race, colour, religion, sex, sexual orientation, gender identity, or disability status.

Top Skills

Ansible

Azure

Bash

Grafana

Graylog

Packer

Postgres

Powershell

Prometheus

Python

Redis

SQL Server

Terraform

143-147 Regent Street, 5th Floor, Crown House, London, United Kingdom, M4W 1E6

Similar Jobs

Remote

Senior Site Reliability Engineer

7 Days Ago

Remote

Senior level

HR Tech

The Senior Site Reliability Engineer will manage and improve the existing infrastructure, automate deployment processes, and enhance platform reliability while collaborating closely with the security team and product engineers for scalability and stability improvements.

Top Skills: AWSClojureElixirGitlab CiGoJavaKubernetesLinuxNode.jsPythonTerraform

Airalo

Senior Site Reliability Engineer

2 Days Ago

Remote

Senior level

Information Technology

The Senior Site Reliability Engineer will develop and maintain reliable systems, drive automation of operational tasks, track Service Level Objectives, and work collaboratively with software engineers to enhance system performance and reliability, participating in on-call rotations and incident responses.

Top Skills: AWSCi/CdGithub ActionsGoJavaKubernetesPythonTerraform

Ensono

Senior SRE (Data)

3 Days Ago

Easy Apply

Remote

Hybrid

United Kingdom

Easy Apply

Senior level

Cloud • Information Technology

As a Senior SRE at Ensono, you will provide support for escalated data-related issues, troubleshoot databases and ETL processes, collaborate with data engineering teams, and ensure data integration efficiency while advocating for SRE practices with stakeholders.

Top Skills: Amazon Web ServicesAws CloudwatchAzureAzure DevopsAzure MonitorDatadogGithub ActionsGitlabGoogle Cloud PlatformNewrelicPythonSplunkTerraform

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Cority

Sr. Site Reliability Engineer

Top Skills

Cority London, England Office

Similar Jobs

Senior Site Reliability Engineer

Senior Site Reliability Engineer

Senior SRE (Data)

What you need to know about the London Tech Scene