Ocient Jobs

Site Reliability Engineer

Ocient

Site Reliability Engineer

Posted 2 Days Ago

Be an Early Applicant

Remote

Hiring Remotely in Greece

Mid level

Remote

Hiring Remotely in Greece

Mid level

Maintain and expand Ocient's hosted data warehouse services with a focus on high availability, performance, observability, automation, security, and incident management. Build monitoring, logging, alerting, CI/CD, and automate Linux server deployments while supporting backup, DR, and test infrastructure.

The summary above was generated by AI

About Ocient:

Ocient is building OcientAIQ™ – a complete ecosystem for delivering trusted agentic AI solutions at petabyte scale, for the organizations that can't afford to get AI wrong. Our customers protect networks, secure nations, and power the global economy. The problems we solve are genuinely hard, and the work matters.

Founded in 2016 by the team that built Cleversafe (acquired by IBM in 2015), Ocient is headquartered in Chicago with a remote-first global team. We are a carbon-neutral company backed by leading investors including Greycroft, OCA Ventures, In-Q-Tel, and Buoyant Ventures.

Do not contact Ocient directly to apply for a role. For security purposes, any applications received via email will be deleted.

Job Title: Site Reliability Engineer

Location: Remote (United Kingdom)

Hiring Manager: Service Delivery Engineering Manager

Estimated salary range: £74,000 to £90,000

• The salary offered for this position will be based on a candidate’s experience and skill demonstrated during interviews and other evaluations

Position Overview

Ocient is searching for an experienced Site Reliability Engineer with strong problem-solving skills and a passion for solving hard problems to help maintain and expand Ocient's "as a service" offering of its cutting-edge data warehouse.

Responsibilities

Support the design and operations of Ocient's hosted database and related services — including message queues and storage systems — ensuring high availability, performance, and efficiency.
Design and maintain monitoring, log centralization, and alerting for all services to facilitate

observability and incident management.

Automate deployment and configuration Linux-based servers, including the OS and the

numerous applications that compose our hosted offerings.

Develop and maintain rigorous security practices to protect our applications and customer

data.

Assist with automation of testing pipelines for the Ocient DB and monitoring of test

infrastructure.

Ideal Qualifications

3+ years of experience in system administration in production environments.
Scripting experience with Bash, Python, or other languages.
Experience with system and software monitoring and alerting tools, such as the ELK stack,
Graylog, InfluxDB, Prometheus, Zabbix, Grafana, Dynatrace, or others.
Experience with configuration management software such as Ansible, Puppet, or Chef.
Experience with data archiving, backup and disaster recovery
Continuous Integration / Continuous Deployment experience with Jenkins, Gitlab CI or
others.
Experience with source control tools like Git.
Ability to work flexible hours and serve in on call rotations.

An Exceptional Candidate Will Have:

Knowledge of OWASP principles for application security.
Experience with server / system virtualization and containerization technologies e.g.,
ProxMox, KVM, VMware.
Experience with SQL and Database Administration.
Experience managing and operating cloud infrastructure. (e.g. AWS, GCP, Azure)
Experience with SSAE18 SOC2 Compliance.
Experience with networking administration, including VPN, proxy, DNS, and firewall

configuration.

Interview Requirements: All interviews are conducted via video and require candidates to have their camera on for the duration of the session. The use of video filters, face-altering effects, or virtual backgrounds is not permitted for security and verification purposes.

We are not open to using an agency or staffing company at this time. We do not accept unsolicited agency or staffing resumes and we are not responsible for any fees related to unsolicited resumes.

Ocient is an equal employment opportunity employer. All qualified applicants will receive consideration for employment without regard to race, creed, color, religion, sex (including pregnancy status), sexual orientation, gender identity, national origin or ancestry, ethnicity, citizenship status, age, physical or mental disability, veteran status, marital status, parental status, genetic information, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, please contact [email protected] for more information.

All official Ocient job postings and recruiting communications will come directly from our team via our Careers page, LinkedIn, or from an ocient.com email address. If you receive communication about a role from any other source, please treat it with caution and direct questions to [email protected].

Similar Jobs

Replit

Site Reliability Engineer

19 Days Ago

Remote

Senior level

Artificial Intelligence • Cloud • Machine Learning • Software • Database • App development • Generative AI

Lead SRE efforts to ensure reliability, scalability, and performance of a large-scale platform. Architect observability, define SLOs/SLIs, lead incident response and post-mortems, automate infrastructure and CI/CD, optimize Kubernetes/GCP deployments, debug distributed systems, mentor engineers, and write production-quality Python or Go code.

Top Skills: Ci/CdCloud-NativeDatadogDockerGCPGoGrafanaInfrastructure As CodeKubernetesLoggingOpentelemetryPrometheusPulumiPythonTerraformTracing

Nebius

Senior Site Reliability Engineer

19 Days Ago

In-Office or Remote

Senior level

Artificial Intelligence • Information Technology • Consulting

Lead reliability and observability for compute nodes running VMs. Debug Linux user/kernel issues, troubleshoot CPU/memory/NUMA/cgroups, operate QEMU/KVM and container tech, design node-level metrics/logs/traces/SLIs/SLOs, run incident response and collaborate across platform, kernel, GPU and infrastructure teams.

Top Skills: CgroupsContainersEbpfFtraceGpusInfinibandKernel Crash DumpsKubernetesLinuxNamespacesNumaNvlinkPerfQemu/KvmStrace

Alpaca

Site Reliability Engineer

5 Days Ago

Remote

Mid level

Fintech • Information Technology

Operate and improve Alpaca's production infrastructure: on-call incident response, define SLIs/SLOs, enhance observability, ship infrastructure as code via GitOps, and strengthen PostgreSQL reliability (performance, migrations, HA/DR). Mentor teams on reliability and database fundamentals.

Top Skills: DnsGitopsGoKubernetesLinuxLoad BalancingObservabilityPostgresPythonTlsVpc

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.