Aurora Labs Logo

Aurora Labs

Site Reliability Engineer

Posted 16 Days Ago
Be an Early Applicant
Remote
Senior level
Remote
Senior level
As a Site Reliability Engineer at Aurora, you will ensure high availability of infrastructure while automating software component maintenance using tools like Ansible and Terraform. You will be responsible for incident management, system monitoring, and developing CLI tools and services for backend processes. Your role includes optimizing system performance within the NEAR and Aurora networks, alongside software engineering tasks to enhance reliability and efficiency.
The summary above was generated by AI

About Us 


Aurora is a network of Virtual Chains that combines NEAR’s scalability with powerful infrastructure for the easy deployment of preconfigured blockchains. By integrating a high-performance EVM, the trustless Rainbow Bridge, and advanced Cross Contract Call technology, Aurora goes beyond full Ethereum compatibility, opening the doors to a multichain world.


We invite you to be a part of our team of smart, professional, result-oriented and fun individuals. Join us to help ensure that our background processes run smoothly while we are striving to become the best in the industry.


Our Values


- Execute extreme ownership.

- Strive for excellence.

- Embrace authenticity.

- Promote merit.

- Get shit done. 


About the team


Our infrastructure team is responsible for building and supporting critical systems required for running and accessing NEAR and Aurora networks. That includes everything on the path of RPC requests before they hit the blockchain and block production and event delivery once transactions are executed.


Load balancing, caching, queueing, transaction simulation and block production is processed by the services written and maintained by the infrastructure team. These services operate at large scale and process terabytes of data. The platform is based on open-source software, such as Kubernetes, NATS, Jetstream, Blockscout, Grafana, Postgres and Near-core, alongside a few internally developed services.


All internally developed services are written in Go and implement core pieces of functionality such as Mempool management, NEAR chunk distribution, transaction pre-processing and simulation.


About the position 


This role is split between two responsibilities: site reliability (80%) and software engineering (20%).


Reliability Engineering includes:


- Ensuring high availability and failure tolerance of our infrastructure.

- Automating configuration and maintenance of software components such as K8s, NATS, Influxdb, Postgres, Cloudflare using e.g. Ansible, Terraform, Helm and kubernetes operators.

- Design and implementation of cloud-agnostic solutions without exclusively relying on specific cloud vendors- Validator and RPC nodes management automation.

- Optimizing the latency and throughput of the pub-sub infrastructure.

- Incident management, monitoring, distributed tracing and recovery automation. 


Software Engineering projects include:


- Sidecars that implement infrastructure cloud-agnostic abstractions for developers.

- CLI tools for pubsub and streaming infrastructure operations.

- Time series processing engine for our transaction simulation engine.

- Indexers and blockchain event aggregation pipelines for monitoring purposes.


About you


You are a reliability engineer with experience of creating and maintaining backend systems. You are familiar with the entire Linux stack and can easily find a bottleneck in a distributed system. You have developed CLI tools and backend services before and are comfortable applying your software development skills to automate your daily operations or to create a microservice on the request path of the end users.


Key Qualifications


- Strong emphasis on SRE as an engineering subject area, with proficiency in Golang.

- Successful track-record and proven experience as a backend internet services software developer.

- Knowledge of SDLC, including continuous integration and testing methodologies.

- Understanding of base internet infrastructure services including DNS, HTTP, server virtualization, server monitoring in critical, large scale distributed systems.

- Understanding of SRE principals, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts, with a keen eye for opportunities to eliminate toil by code and process improvements.

- Excellent verbal and written communication skills in English.


Desired skills


-Experience with development within Kubernetes ecosystem, including operator framework, controllers and CRDs.

- Experience with streaming and pubsub systems such as NATS, Apache Kafka, Apache Pulsar.

- Hardware bootstrap and associated security.

- Structured or unstructured storage and caching.

- Automating operations processes via services and tools.

- Configuration management and fleet orchestration via Puppet, Chef, Ansible, or others.

- Cloud Services (AWS S3/EC2/CloudFront or equivalent).

In applying at this job, I confirm and acknowledge that I read and understood the Privacy Notice published at https://auroralabs.dev/privacy.

Top Skills

Go

Similar Jobs

Be an Early Applicant
2 Days Ago
27 Locations
Remote
1,200 Employees
Senior level
1,200 Employees
Senior level
Big Data • Cloud • Software • Database
The Senior Staff Site Reliability Engineer will ensure the reliability and performance of Fivetran’s production infrastructure. Responsibilities include monitoring availability, automating deployment, collaborating with engineering teams, resolving incidents, and enhancing infrastructure security and stability.
2 Days Ago
29 Locations
Remote
2,350 Employees
Entry level
2,350 Employees
Entry level
Cloud • Security • Software • Cybersecurity • Automation
As an Intermediate Site Reliability Engineer in FinOps at GitLab, you'll ensure systems are scalable, reliable, and financially optimized. Your role involves automating cost management, collaborating with finance and engineering teams, and promoting FinOps principles across operations for cost optimization and financial accountability.
Be an Early Applicant
4 Days Ago
28 Locations
Remote
2,350 Employees
Mid level
2,350 Employees
Mid level
Cloud • Security • Software • Cybersecurity • Automation
The Intermediate Site Reliability Engineer will enhance GitLab's delivery platform by automating release processes, improving monitoring, and optimizing deployment strategies. Key tasks include collaborating with Engineering teams, creating new tools, and ensuring timely and efficient software releases.

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account