Maintain and expand Ocient's hosted data warehouse services with a focus on high availability, performance, observability, automation, security, and incident management. Build monitoring, logging, alerting, CI/CD, and automate Linux server deployments while supporting backup, DR, and test infrastructure.
About Ocient:
Ocient is building OcientAIQ™ – a complete ecosystem for delivering trusted agentic AI solutions at petabyte scale, for the organizations that can't afford to get AI wrong. Our customers protect networks, secure nations, and power the global economy. The problems we solve are genuinely hard, and the work matters.
Founded in 2016 by the team that built Cleversafe (acquired by IBM in 2015), Ocient is headquartered in Chicago with a remote-first global team. We are a carbon-neutral company backed by leading investors including Greycroft, OCA Ventures, In-Q-Tel, and Buoyant Ventures.
Job Title: Site Reliability Engineer
Location: Remote (United Kingdom)
Hiring Manager: Service Delivery Engineering Manager
Estimated salary range: £74,000 to £90,000
• The salary offered for this position will be based on a candidate’s experience and skill demonstrated during interviews and other evaluations
Position Overview
Ocient is searching for an experienced Site Reliability Engineer with strong problem-solving skills and a passion for solving hard problems to help maintain and expand Ocient's "as a service" offering of its cutting-edge data warehouse.
Responsibilities
- Support the design and operations of Ocient's hosted database and related services — including message queues and storage systems — ensuring high availability, performance, and efficiency.
- Design and maintain monitoring, log centralization, and alerting for all services to facilitate
observability and incident management.
- Automate deployment and configuration Linux-based servers, including the OS and the
numerous applications that compose our hosted offerings.
- Develop and maintain rigorous security practices to protect our applications and customer
data.
- Assist with automation of testing pipelines for the Ocient DB and monitoring of test
infrastructure.
Ideal Qualifications
- 3+ years of experience in system administration in production environments.
- Scripting experience with Bash, Python, or other languages.
- Experience with system and software monitoring and alerting tools, such as the ELK stack,
- Graylog, InfluxDB, Prometheus, Zabbix, Grafana, Dynatrace, or others.
- Experience with configuration management software such as Ansible, Puppet, or Chef.
- Experience with data archiving, backup and disaster recovery
- Continuous Integration / Continuous Deployment experience with Jenkins, Gitlab CI or
- others.
- Experience with source control tools like Git.
- Ability to work flexible hours and serve in on call rotations.
An Exceptional Candidate Will Have:
- Knowledge of OWASP principles for application security.
- Experience with server / system virtualization and containerization technologies e.g.,
- ProxMox, KVM, VMware.
- Experience with SQL and Database Administration.
- Experience managing and operating cloud infrastructure. (e.g. AWS, GCP, Azure)
- Experience with SSAE18 SOC2 Compliance.
- Experience with networking administration, including VPN, proxy, DNS, and firewall
configuration.
Interview Requirements: All interviews are conducted via video and require candidates to have their camera on for the duration of the session. The use of video filters, face-altering effects, or virtual backgrounds is not permitted for security and verification purposes.
We are not open to using an agency or staffing company at this time. We do not accept unsolicited agency or staffing resumes and we are not responsible for any fees related to unsolicited resumes.
Ocient is an equal employment opportunity employer. All qualified applicants will receive consideration for employment without regard to race, creed, color, religion, sex (including pregnancy status), sexual orientation, gender identity, national origin or ancestry, ethnicity, citizenship status, age, physical or mental disability, veteran status, marital status, parental status, genetic information, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, please contact [email protected] for more information.
All official Ocient job postings and recruiting communications will come directly from our team via our Careers page, LinkedIn, or from an ocient.com email address. If you receive communication about a role from any other source, please treat it with caution and direct questions to [email protected].
Similar Jobs
Artificial Intelligence • Cloud • Machine Learning • Software • Database • App development • Generative AI
Lead SRE efforts to ensure reliability, scalability, and performance of a large-scale platform. Architect observability, define SLOs/SLIs, lead incident response and post-mortems, automate infrastructure and CI/CD, optimize Kubernetes/GCP deployments, debug distributed systems, mentor engineers, and write production-quality Python or Go code.
Top Skills:
Ci/CdCloud-NativeDatadogDockerGCPGoGrafanaInfrastructure As CodeKubernetesLoggingOpentelemetryPrometheusPulumiPythonTerraformTracing
Artificial Intelligence • Information Technology • Consulting
Lead reliability and observability for compute nodes running VMs. Debug Linux user/kernel issues, troubleshoot CPU/memory/NUMA/cgroups, operate QEMU/KVM and container tech, design node-level metrics/logs/traces/SLIs/SLOs, run incident response and collaborate across platform, kernel, GPU and infrastructure teams.
Top Skills:
CgroupsContainersEbpfFtraceGpusInfinibandKernel Crash DumpsKubernetesLinuxNamespacesNumaNvlinkPerfQemu/KvmStrace
Fintech • Information Technology
Operate and improve Alpaca's production infrastructure: on-call incident response, define SLIs/SLOs, enhance observability, ship infrastructure as code via GitOps, and strengthen PostgreSQL reliability (performance, migrations, HA/DR). Mentor teams on reliability and database fundamentals.
Top Skills:
DnsGitopsGoKubernetesLinuxLoad BalancingObservabilityPostgresPythonTlsVpc
What you need to know about the London Tech Scene
London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.


