Design and build the AI inference infrastructure for Cloudflare, optimizing systems for high availability and performance while mentoring junior engineers.
Available Locations: Austin, TX or London, UK (Hybrid) About the role
You'll design and build the core infrastructure that powers AI inference across Cloudflare's global network - real-time voice, frontier open LLMs, and customer-deployed models running on a heterogeneous fleet of GPUs and next-generation accelerators in hundreds of cities worldwide. Working alongside AI/ML engineers, hardware partners, and Cloudflare product teams, you'll solve hard problems in distributed systems and high-performance computing: sub-second model cold starts, multi-accelerator workload scheduling, efficient KV cache management, and a model deployment platform serving both Cloudflare and customers bringing their own models. We're building an AI inference platform embedded in the fabric of the internet - something that doesn't exist yet - and this role puts you at the center of it. We're looking for high-agency systems engineers who are energized by foundational infrastructure problems and want to define how AI runs at the edge of the network.
Role Responsibilities
Must-Have Skills
Nice-to-Have Skills
You'll design and build the core infrastructure that powers AI inference across Cloudflare's global network - real-time voice, frontier open LLMs, and customer-deployed models running on a heterogeneous fleet of GPUs and next-generation accelerators in hundreds of cities worldwide. Working alongside AI/ML engineers, hardware partners, and Cloudflare product teams, you'll solve hard problems in distributed systems and high-performance computing: sub-second model cold starts, multi-accelerator workload scheduling, efficient KV cache management, and a model deployment platform serving both Cloudflare and customers bringing their own models. We're building an AI inference platform embedded in the fabric of the internet - something that doesn't exist yet - and this role puts you at the center of it. We're looking for high-agency systems engineers who are energized by foundational infrastructure problems and want to define how AI runs at the edge of the network.
Role Responsibilities
- Develop and maintain core components of the serverless inference platform to ensure high availability and scalability for Cloudflare users.
- Optimize the model scheduling system to significantly increase efficiency and resource utilization across our inference infrastructure.
- Implement improvements to the inference request routing logic to enhance overall performance and reduce latency for end-users.
- Drive significant, measurable improvements in the platform's reliability and resilience by identifying and mitigating systemic risks.
- Expand and refine the observability stack, including metrics, logging, and tracing, and fine-tune alerts to proactively identify and resolve production issues.
- Lead complex, cross-functional technical projects from initial concept and design through final deployment and operationalization.
- Act as a mentor to junior engineers and actively contribute to cultivating a strong, collaborative engineering culture within the team.
Must-Have Skills
- Experience in systems engineering, with a focus on distributed, high-performance systems.
- Expert proficiency in Rust programming, particularly in an asynchronous environment.
- Deep understanding and hands-on experience with relevant networking and application protocols (e.g., TCP, HTTP, WebSocket).
- Experience with scaling and performance optimization techniques, including load balancing and caching in a distributed environment.
Nice-to-Have Skills
- Demonstrable experience with container orchestration platforms, specifically Kubernetes and/or Nomad.
- Familiarity with the challenges and architectures involved in large-scale inference serving (e.g., LLM and diffusion models).
Top Skills
HTTP
Kubernetes
Nomad
Rust
Tcp
Websocket
Cloudflare London, England Office
Riverside Building, 6th Floor, County Hall/The, Belvedere Rd, London, United Kingdom, SE1 7PB
Similar Jobs at Cloudflare
Cloud • Information Technology • Security • Software • Cybersecurity
Lead technical projects for the Workers AI team focused on deploying AI inference, building innovative features, and enhancing developer experience.
Top Skills:
PythonPyTorchTensorFlow
Cloud • Information Technology • Security • Software • Cybersecurity
The Escalation Engineer resolves complex customer issues, provides technical support, manages escalation lifecycles, and documents findings to improve service quality.
Top Skills:
BashBrowser RenderingDatabase InteractionsDockerDom ManipulationElk StackGrafanaHTMLHttp/SJaegerJavaScriptKubernetesLinuxPHPPrometheusPythonRestful ApisSentryServer-Side ArchitectureServerless FunctionsSQLWeb Frameworks
Cloud • Information Technology • Security • Software • Cybersecurity
As the Senior Product Manager for Cloudflare One Appliance, you will lead the vision and execution of a managed hardware and virtual appliance, leveraging AI to enhance customer connectivity and automate workflows, while collaborating with internal teams and maintaining vendor relations.
Top Skills:
AIAWSAzureBgpGCPGreIpsecOciOspfSaseSd-Wan
What you need to know about the London Tech Scene
London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

