Flex

Senior Infrastructure Engineer, SRE

Posted 21 Days Ago

Be an Early Applicant

Remote

12 Locations

Senior level

Remote

12 Locations

Senior level

As a Senior Infrastructure Engineer at Flex, you will design, build, and maintain scalable cloud infrastructure, optimize system performance, and automate processes. You'll collaborate with service teams, implement SRE principles, and improve developer workflows, ensuring high reliability and effectiveness of systems while leveraging AWS and GCP technology.

The summary above was generated by AI

Flex is a growth-stage, NYC headquartered FinTech company that is creating the best rent payment experience. It’s hard to believe that it’s 2025 and paying rent on time is expensive, inflexible, and difficult. We’re here to change that! Flex enables our users to pay rent throughout the month on a schedule that better fits their finances and budget. Our mission is to empower as many renters as possible with flexibility over their most significant recurring expense. After deliberately keeping a stealth profile as we built up unprecedented investor support and an enthusiastic user base, we are looking for motivated individuals to help us keep our mission growing. Will you be a part of the team?

About the role

Flex is looking for a seasoned Senior Infrastructure Engineer with a passion for performance optimization to join our dynamic Infrastructure Team.

In this role, you will be part of the Infrastructure Engineering team, a small team responsible for creating a sustainable platform that ensures the effectiveness, reliability and scalability of our systems. You'll play a pivotal role in designing, building, and maintaining our robust and scalable infrastructure. You'll collaborate closely with our service engineering teams to automate processes, streamline operations, and ensure optimal system performance and reliability in our cloud infrastructure on AWS and GCP.

We are particularly interested in candidates with software engineering experience in languages like Java, Python, or TypeScript. This background will allow you to collaborate effectively with product teams, build tools and automation, and improve the developer experience across our engineering organization. You’ll have the opportunity to influence key infrastructure and architecture decisions while ensuring high reliability and smooth delivery pipelines.

This remote role requires a minimum of 5 years of cloud infrastructure experience.

What you’ll do

Collaborate with service engineering teams to design, implement, and maintain scalable and resilient infrastructure solutions optimizing for performance, resilience, and cost.
Ensure infrastructure aligns with business requirements and industry standards.
Leverage Terraform to automate infrastructure provisioning and configurations.
Implement SRE principles to improve system reliability and reduce downtime.
Improve developer workflows by creating self-service tools, optimizing CI/CD pipelines, and enhancing deployment processes to remove friction.
Develop and maintain robust monitoring and alerting systems to proactively identify and resolve issues.
Lead incident responses, manage on-call rotations, and facilitate post-incident reviews to drive continuous improvement and resilience.
Automate everything—drive adoption of Infrastructure as Code (IaC) and build automated pipelines for testing, monitoring, and deployments.
Leverage your excellent written and verbal communication skills, to create communications on upcoming changes and how they affect teams.

Key qualifications

Proven experience in building, scaling and monitoring cloud infrastructure on AWS, especially EKS, S3, RDS, API Gateway, Load Balancers, VPC, Lambdas, DocumentDB and DynamoDB.
Proven experience using Terraform to update and maintain cloud infrastructure.
Proven experience with containerized applications, kubernetes and microservice deployments.
Strong knowledge of GitHub Actions and CI/CD best practices.
Experience with developer productivity tools: designing CI/CD workflows, building internal tools, and creating self-service solutions to streamline software development.
Knowledge of monitoring and observability tools and frameworks, with working knowledge of Datadog being a plus.
Familiarity with networking concepts (DNS, load balancing, firewalls, VPNs).
Strong collaboration skills with the ability to work effectively across teams and communicate technical ideas clearly.
Experience coding/reading in one of the industry standard language such as Java, Python, TypeScript

#LI-Remote

Life at Flex:

We understand that it takes a diverse team of highly intelligent, curious, determined, empathetic, and self aware people to grow a successful company. Our HQ is located in New York City, but we have employees located throughout the US, Australia, Canada and South America. We are growing quickly, but deliberately, with a focus on building an inclusive culture. Our dynamic team has incredible perspectives to share, just as we know you do, and we take great pride in being an equal opportunity workplace.

We offer many employee benefits. For full time, U.S. based employees we offer:

Competitive pay
100% company-paid medical, dental, and vision
401(k) + company equity
Unlimited paid time off with a PTO minimum + 13 company paid holidays
Parental leave
Flex Cares Program: Non-profit company match + pet adoption coverage
Free Flex subscription

For full time non-US employees, we offer

Competitive Pay
Company Equity
Unlimited PTO

Top Skills

AWS

Ci/Cd

Datadog

GCP

Github Actions

Java

Kubernetes

Python

Terraform

Typescript

Similar Jobs

Superhuman

Senior Site Reliability Engineer

8 Days Ago

Easy Apply

Remote

Easy Apply

Senior level

Consumer Web • Enterprise Web • Mobile • Productivity • Software

The Senior Site Reliability Engineer will focus on ensuring the performance and reliability of critical services through SRE practices, while also engaging in DevOps tasks such as CI/CD pipeline management and infrastructure automation using tools like Docker and Kubernetes. They will also collaborate with teams to optimize system performance and implement security measures.

Top Skills: Ai IntegrationAutomationBashCapacity PlanningCi/CdComplianceDevOpsDisaster RecoveryDistributed SystemsDockerGitGoHigh AvailabilityIncident ManagementInfrastructure As CodeKubernetesNetworkingNoSQLPerformance OptimizationPythonSecuritySecurityService MonitoringSQLSreSystem ArchitectureSystem DesignTerraform

Macrometa

Sr. Site Reliability Engineer

10 Days Ago

Remote

Senior level

Big Data • Analytics

As a Senior Site Reliability Engineer at Macrometa, you will maintain and scale infrastructure, manage Kubernetes-based systems, develop infrastructure tools, and engage in incident management. You will collaborate with development teams to ensure reliability and performance of global production systems, leveraging your expertise in cloud environments and container orchestration.

Remote

Senior Site Reliability Engineer

22 Days Ago

Remote

Senior level

HR Tech

As a Senior Site Reliability Engineer, you will manage and enhance infrastructure, aid in building the platform with Kubernetes and Terraform, automate deployment processes, collaborate with the Security team on threats, and support engineering teams for scalability and reliability.

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.