Flex Logo

Flex

Senior Infrastructure Engineer, SRE

Posted 21 Days Ago
Be an Early Applicant
Remote
12 Locations
Senior level
Remote
12 Locations
Senior level
As a Senior Infrastructure Engineer at Flex, you will design, build, and maintain scalable cloud infrastructure, optimize system performance, and automate processes. You'll collaborate with service teams, implement SRE principles, and improve developer workflows, ensuring high reliability and effectiveness of systems while leveraging AWS and GCP technology.
The summary above was generated by AI

Flex is a growth-stage, NYC headquartered FinTech company that is creating the best rent payment experience. It’s hard to believe that it’s 2025 and paying rent on time is expensive, inflexible, and difficult. We’re here to change that! Flex enables our users to pay rent throughout the month on a schedule that better fits their finances and budget. Our mission is to empower as many renters as possible with flexibility over their most significant recurring expense. After deliberately keeping a stealth profile as we built up unprecedented investor support and an enthusiastic user base, we are looking for motivated individuals to help us keep our mission growing. Will you be a part of the team?

About the role

Flex is looking for a seasoned Senior Infrastructure Engineer with a passion for performance optimization to join our dynamic Infrastructure Team.

In this role, you will be part of the Infrastructure Engineering team, a small team responsible for creating a sustainable platform that ensures the effectiveness, reliability and scalability of our systems.  You'll play a pivotal role in designing, building, and maintaining our robust and scalable infrastructure. You'll collaborate closely with our service engineering teams to automate processes, streamline operations, and ensure optimal system performance and reliability in our cloud infrastructure on AWS and GCP.

We are particularly interested in candidates with software engineering experience in languages like Java, Python, or TypeScript. This background will allow you to collaborate effectively with product teams, build tools and automation, and improve the developer experience across our engineering organization. You’ll have the opportunity to influence key infrastructure and architecture decisions while ensuring high reliability and smooth delivery pipelines.  

This remote role requires a minimum of 5 years of cloud infrastructure experience.

What you’ll do

  • Collaborate with service engineering teams to design, implement, and maintain scalable and resilient infrastructure solutions optimizing for performance, resilience, and cost.
  • Ensure infrastructure aligns with business requirements and industry standards.
  • Leverage Terraform to automate infrastructure provisioning and configurations.
  • Implement SRE principles to improve system reliability and reduce downtime.
  • Improve developer workflows by creating self-service tools, optimizing CI/CD pipelines, and enhancing deployment processes to remove friction.
  • Develop and maintain robust monitoring and alerting systems to proactively identify and resolve issues.
  • Lead incident responses, manage on-call rotations, and facilitate post-incident reviews to drive continuous improvement and resilience.
  • Automate everything—drive adoption of Infrastructure as Code (IaC) and build automated pipelines for testing, monitoring, and deployments.
  • Leverage your excellent written and verbal communication skills, to create communications on upcoming changes and how they affect teams.

Key qualifications

  • Proven experience in building, scaling and monitoring cloud infrastructure on AWS, especially EKS, S3, RDS, API Gateway, Load Balancers, VPC, Lambdas, DocumentDB and DynamoDB.
  • Proven experience using Terraform to update and maintain cloud infrastructure.
  • Proven experience with containerized applications, kubernetes and microservice deployments.
  • Strong knowledge of GitHub Actions and CI/CD best practices.
  • Experience with developer productivity tools: designing CI/CD workflows, building internal tools, and creating self-service solutions to streamline software development.
  • Knowledge of monitoring and observability tools and frameworks, with working knowledge of Datadog being a plus.
  • Familiarity with networking concepts (DNS, load balancing, firewalls, VPNs).
  • Strong collaboration skills with the ability to work effectively across teams and communicate technical ideas clearly.
  • Experience coding/reading in one of the industry standard language such as Java, Python, TypeScript

#LI-Remote

Life at Flex:

We understand that it takes a diverse team of highly intelligent, curious, determined, empathetic, and self aware people to grow a successful company. Our HQ is located in New York City, but we have employees located throughout the US, Australia, Canada and South America. We are growing quickly, but deliberately, with a focus on building an inclusive culture. Our dynamic team has incredible perspectives to share, just as we know you do, and we take great pride in being an equal opportunity workplace.

We offer many employee benefits. For full time, U.S. based employees we offer:

  • Competitive pay
  • 100% company-paid medical, dental, and vision
  • 401(k) + company equity
  • Unlimited paid time off with a PTO minimum + 13 company paid holidays
  • Parental leave 
  • Flex Cares Program: Non-profit company match + pet adoption coverage
  • Free Flex subscription

 For full time non-US employees, we offer

  • Competitive Pay
  • Company Equity
  • Unlimited PTO

Top Skills

AWS
Ci/Cd
Datadog
GCP
Github Actions
Java
Kubernetes
Python
Terraform
Typescript

Similar Jobs

8 Days Ago
Easy Apply
Remote
13 Locations
Easy Apply
Senior level
Senior level
Consumer Web • Enterprise Web • Mobile • Productivity • Software
The Senior Site Reliability Engineer will focus on ensuring the performance and reliability of critical services through SRE practices, while also engaging in DevOps tasks such as CI/CD pipeline management and infrastructure automation using tools like Docker and Kubernetes. They will also collaborate with teams to optimize system performance and implement security measures.
Top Skills: Ai IntegrationAutomationBashCapacity PlanningCi/CdComplianceDevOpsDisaster RecoveryDistributed SystemsDockerGitGoHigh AvailabilityIncident ManagementInfrastructure As CodeKubernetesNetworkingNoSQLPerformance OptimizationPythonSecuritySecurityService MonitoringSQLSreSystem ArchitectureSystem DesignTerraform
10 Days Ago
Remote
15 Locations
Senior level
Senior level
Big Data • Analytics
As a Senior Site Reliability Engineer at Macrometa, you will maintain and scale infrastructure, manage Kubernetes-based systems, develop infrastructure tools, and engage in incident management. You will collaborate with development teams to ensure reliability and performance of global production systems, leveraging your expertise in cloud environments and container orchestration.
22 Days Ago
Remote
12 Locations
Senior level
Senior level
HR Tech
As a Senior Site Reliability Engineer, you will manage and enhance infrastructure, aid in building the platform with Kubernetes and Terraform, automate deployment processes, collaborate with the Security team on threats, and support engineering teams for scalability and reliability.

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account