Site Reliability Engineer I

Sorry, this job was removed at 12:11 a.m. (GMT) on Thursday, Jul 24, 2025

Be an Early Applicant

In-Office

London, England

In-Office

London, England

Similar Jobs

LexisNexis

Senior Site Reliability Engineer

9 Days Ago

In-Office

Senior level

Information Technology • Legal Tech • Professional Services • Analytics • Business Intelligence

The role involves designing and managing cloud infrastructures, leading Kubernetes deployments, troubleshooting technical issues, collaborating with teams, and mentoring junior members.

Top Skills: ArmAzure FunctionsAzure MonitorAzure VmsBashBicepCloudwatchCoralogixDatadogDockerEc2EksKubernetesLambdaPowershellPythonRdsS3TerraformTypescript

Speechmatics

Site Reliability Engineer

8 Days Ago

Easy Apply

Hybrid

London, Greater London, England, GBR

Easy Apply

Mid level

Machine Learning • Software • Conversational AI

The Site Reliability Engineer will enhance the reliability of products and systems, manage cloud deployments, automate processes, and improve monitoring and incident response.

Top Skills: Amazon Web Services (Aws)ArgocdBashDatadogDockerGitlabGoogle Cloud Platform (Gcp)HelmKubernetesAzureOpentelemetryPythonTerraform

Sectigo

Site Reliability Engineer

10 Days Ago

In-Office or Remote

Manchester, Greater Manchester, England, GBR

Mid level

Information Technology • Internet of Things • Machine Learning • Software

The Site Reliability Engineer will ensure the reliability of services, implement infrastructure solutions, automate deployments, and collaborate with teams to enhance operational security.

Top Skills: AnsibleBashDockerGitGrafanaKubernetesLinuxNagiosPrometheusPuppetPythonTerraformUnixVMware

POSITION SUMMARY

The ideal candidate will have 5+ years of experience in Linux systems and software management, expertise with Terraform, Ansible, and cloud platforms like AWS, Azure, and GCP. Experience with large-scale distributed systems, monitoring/alerting systems (Prometheus, Grafana), CI/CD pipelines, container orchestration (Docker, Kubernetes), and programming languages (Go, Java, Python) is essential. A background in implementing security controls, automating deployments, and troubleshooting complex systems is also required.

‎

WHAT YOU'LL DO

Deploy and maintain a resilient, secure, and efficient SaaS application platform to meet established SLAs.
Automate, monitoring, management and incident response to achieve an auto-remediation system.
Monitor site stability and performance and troubleshoot site issues.
Participate in on-call rotation to ensure stability and uptime for our platforms.
Scale infrastructure to meet rapidly increasing demand.
Collaborate with cross-functional teams working with Engineering, Product, Services, and other departments.
Collaborate with developers to bring new features and services into production.
Independently design and develop tools to aid in operations and automation as well as work jointly with other team members to deliver innovative solutions to complex business and technical challenges.
Provide deployment and operations support for multi-tiered distributed software applications.
Estimate engineering effort, plan implementation, and rollout system changes that meet requirements for functionality, performance, scalability, reliability, and adherence to development goals and principles.
Collaborate in a fast paced environment with multiple teams (software development, release management, build and release, etc...).
Collaborate in a fast paced environment with multiple teams in a dynamic entrepreneurial organization
Defining how the behavior of large scale systems can be achieved
Measuring and achieving reliability through engineering and operations work
Monitoring and alert development, documentation and management with the goal of creating an auto-remediation system
Adapting security controls to product not typically native to GA releases
Developing automation methods to extend standard deployment pipelines for bespoke implementations
Patching, policy enforcement, and audit of production systems
Driving the Disaster Recovery process

‎

WHAT YOU'LL NEED

Expertise with Infrastructure-as-Code such as Terraform.
5+ years of professional Linux systems and software management experience
Knowledgeable with code languages including: Go, Node.js, Java
Experience with managing infrastructure within Azure, GCP and AWS
Expertise with monitoring and alerting systems including Prometheus, Grafana
Strong script skills for systems and data driven solutions
JIRA experience for project/task management
Extensive experience in troubleshooting large-scale distributed systems.
Comprehensive background in monitoring and alerting systems in auto-remediation systems including Prometheus, Grafana
Proven examples of standardizing security controls across large-scale systems
Comfort working within project/task management platforms.

Systems and Tools

Cloud platforms including: AWS, Azure, and GCP.
Infrastructure coding languages: Terraform, Cloudformation, Ansible, Puppet, Python
CI/CD: experience working with and supporting build and deploy pipelines and tools: Jenkins, ArgoCD, GitHub Actions, Rundeck
Datastore Management and Query skills: Postgres, MySQL, MongoDB, MSSQL, ElasticSearch, Solr
Container orchestration platforms: Docker, Kubernetes, EKS, AKS
Familiarity with coding languages including: Go, Node.js, Java, Python
Monitoring/Alerting Tools: Prometheus, Grafana, VividCortex, Runscope, Cloudwatch, Monitor, VictorOps
OS and Container Hardening: STIG, CIS, SELinux, IPTables, FIPS 140-2, FIPS 140-3
JSON data structures and database schemas
API Query language: REST, GQL

Bonus Points If

Bachelor’s degree in Computer Science or related field
Have worked in regulated or public sector environments through development and assessment of cloud based solutions
Worked with, developed, or supported continuous integration/continuous deployment systems
Have concrete examples ready to present for creating auto-remediation systems

Veritone is a leading provider of artificial intelligence (AI) technology and solutions. The company's proprietary operating system, aiWARE, orchestrates an expanding ecosystem of machine learning models to transform audio, video and other data sources into actionable intelligence. We love to continuously grow while staying ahead of trends and creating structure in an unstructured world.

If you’ve made it this far and align with our goals, we look forward to reviewing your qualifications!

DISCLOSURE

Our company provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability or genetics.

Candidates must possess the right to work in the UK and be able to provide the necessary documentation to verify this as required by UK immigration laws.

‎

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.