Writer Jobs

Site reliability engineer

Sorry, this job was removed at 04:11 p.m. (GMT) on Wednesday, May 21, 2025

Be an Early Applicant

In-Office or Remote

Hiring Remotely in London, England, GBR

In-Office or Remote

Hiring Remotely in London, England, GBR

Similar Jobs

Optum

Site Reliability Engineer

7 Days Ago

In-Office or Remote

Expert/Leader

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

Define and scale SRE standards across teams, implement SLOs/SLIs/error budgets, build observability and resiliency patterns, drive automation and AIOps, improve reliability for large-scale Azure cloud systems, and influence engineering and platform teams.

Top Skills: Ai/MlAiopsAutomationAzureError BudgetsIncident ManagementLogsObservability (MetricsOpentelemetrySlisSlosTracing)

MongoDB

Site Reliability Engineer

8 Days Ago

Easy Apply

Remote or Hybrid

Easy Apply

Senior level

Big Data • Cloud • Software • Database

Maintain and improve multi-cloud Kubernetes infrastructure, CI/CD (Argo Workflows/ArgoCD), observability, and networking. Build reliable continuous deployment tooling and onboarding flows, provide internal support, collaborate across Platform Engineering, contribute upstream (open-source/operators), and participate in a 24/7 on-call rotation to resolve deployment infrastructure issues.

Top Skills: AlertingArgo WorkflowsArgocdAWSAzureCi/CdContainersDnsGCPGoKubernetesLinuxLoad BalancerObservabilityPythonService MeshTcp/IpTls

Domino Data Lab

Site Reliability Engineer

9 Days Ago

Easy Apply

Remote or Hybrid

Easy Apply

Senior level

Artificial Intelligence • Machine Learning

Lead development of AI-assisted reliability tooling, own incident response end-to-end, improve observability and SLO/SLI frameworks, scale single-tenant SaaS operations, mentor engineers, and reduce recurring operational toil through engineering and automation.

Top Skills: Cloud PlatformsGoKubernetesLinuxLlm/Ai ToolingLogs And TracingObservability ToolingPythonSlo/Sli Frameworks

📐 About this role

We are looking for a foundational member of the Cloud infrastructure team at Writer. This role will involve contributing to the development and implementation of our Site reliability engineering (SRE) program. The ideal candidate will ensure the reliability, scalability, performance, and security of Writer’s critical systems, taking a proactive approach to guarantee that our high-ROI products reach our customers seamlessly.
🦸🏻‍♀️ Your responsibilities:

Lead the design, implementation, and maintenance of Writer, Inc.’s cloud infrastructure to ensure high availability and performance
Design and implement scalable cloud automation to support seamless deployment for our largest enterprise customers
Automate infrastructure provisioning and management using Terraform & Python
Collaborate with development teams to optimize cloud resources and enhance system reliability
Develop and maintain monitoring and alerting systems to proactively identify and resolve issues affecting the reliability of our writing solutions
Conduct post-mortem analyses of system failures to identify root causes and implement preventive measures
Optimize and scale our cloud infrastructure to support growing user demand and ensure cost efficiency
Ensure the security and compliance of our systems, adhering to industry standards and regulations
Provide mentorship and technical guidance to junior engineers, fostering a culture of reliability and continuous improvement
Stay current with emerging technologies and industry trends to continuously improve our site reliability practices

⭐ Is this you?

Proven expertise in Site Reliability Engineering with a minimum of 7 years of hands-on experience
Deep understanding of system architecture and infrastructure design to ensure high availability and performance
Bachelor’s degree in Computer Science, Engineering, or a related technical field
Strong proficiency in programming languages such as Python, Java, Go for automation and monitoring
Experience with cloud platforms like AWS, Azure, or GCP, and their respective services for scalable and resilient systems
Expertise in containerization technologies (e.g., Docker, Kubernetes) and orchestration tools
Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) to maintain system health and performance
Ability to lead and mentor junior engineers in best practices for reliability and system optimization
Excellent communication skills to collaborate effectively with cross-functional teams and stakeholders
Proactive approach to identifying and mitigating potential system failures and performance bottlenecks

Preferred skills & experience:
- Software engineering expertise
- Terraform
- Python
- Kubernetes
- Scala
- AWS/GCP

🍩 Benefits & perks (UK full-time employees):

Generous PTO, plus company holidays
Comprehensive medical and dental insurance
Paid parental leave for all parents (12 weeks)
Fertility and family planning support
Early-detection cancer testing through Galleri
Competitive pension scheme and company contribution
Annual work-life stipends for:
- Home office setup, cell phone, internet
- Wellness stipend for gym, massage/chiropractor, personal training, etc.
- Learning and development stipend
Company-wide off-sites and team off-sites
Competitive compensation and company stock options

#LI-Hybrid

🍩 Benefits & perks (US Full-time employees)

Generous PTO, plus company holidays
Medical, dental, and vision coverage for you and your family
Paid parental leave for all parents (12 weeks)
Fertility and family planning support
Early-detection cancer testing through Galleri
Flexible spending account and dependent FSA options
Health savings account for eligible plans with company contribution
Annual work-life stipends for:
- Home office setup, cell phone, internet
- Wellness stipend for gym, massage/chiropractor, personal training, etc.
- Learning and development stipend
Company-wide off-sites and team off-sites
Competitive compensation, company stock options and 401k

Writer is an equal-opportunity employer and is committed to diversity. We don't make hiring or employment decisions based on race, color, religion, creed, gender, national origin, age, disability, veteran status, marital status, pregnancy, sex, gender expression or identity, sexual orientation, citizenship, or any other basis protected by applicable local, state or federal law. Under the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

By submitting your application on the application page, you acknowledge and agree to Writer's Global Candidate Privacy Notice.

#BI-Remote

London, United Kingdom

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Writer

Site reliability engineer

Similar Jobs

Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Writer London, England Office

What you need to know about the London Tech Scene