Cantina Labs Jobs

Inference Engineer, Video AI

Sorry, this job was removed at 08:25 p.m. (GMT) on Monday, Nov 17, 2025

In-Office

London, England, GBR

In-Office

London, England, GBR

Similar Jobs

Ericsson

Line Manager - Product Design

2 Hours Ago

In-Office

Senior level

Cloud • Information Technology • Internet of Things • Machine Learning • Software • Cybersecurity • Infrastructure as a Service (IaaS)

Lead and develop a multidisciplinary team responsible for antenna design, ensuring timely product delivery and adherence to quality standards. Oversee resource allocation, promote team processes, and engage with stakeholders to drive technical projects.

Top Skills: ElectronicsMechanical DesignRf Systems

HERE Technologies

Artificial Intelligence Engineer

16 Hours Ago

Hybrid

Senior level

Artificial Intelligence • Automotive • Computer Vision • Information Technology • Internet of Things • Logistics • Software

The Lead Agentic AI Engineer role involves optimizing AI models for real-world applications, requiring expertise in model design, performance tuning, and hands-on experimentation.

Top Skills: PythonPyTorchRustTensorFlowTypescript

Zeta Global

Senior Software Engineer

17 Hours Ago

Easy Apply

Hybrid

Easy Apply

Senior level

AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics

The Senior Software Engineer will lead the design and development of real-time systems, collaborate on product design, mentor junior engineers, and maintain infrastructure for the Zeta Marketing Platform.

Top Skills: AkkaAWSJavaPlay FrameworkScalaTerraformZio

A bit about Cantina:

Cantina, founded by Sean Parker, is a new social platform with the most advanced AI character creator. Build, share, and interact with AI bots and your friends directly in the Cantina or across the internet.

Cantina bots are lifelike, social creatures, capable of interacting wherever humans go on the internet. Recreate yourself using powerful AI, imagine someone new, or choose from thousands of existing characters. Bots are a new media type that offer a way for creators to share infinitely scalable and personalized content experiences combined with seamless group chat across voice, video, and text.

If you're excited about the potential AI has to shape human creativity and social interactions, join us in building the future!

A bit about the role: We're looking for an Inference Engineer who specializes in productionizing and hosting video AI models at scale. You'll be responsible for taking cutting-edge neural networks from research to production, building robust inference infrastructure, and optimizing model performance for real-time applications. This role focuses on the deployment and serving of large video models.

As an Inference Engineer, you will:

Deploy video AI models to production – Take research models and build production-ready inference endpoints with APIs, ensuring efficient operation across cloud infrastructure.
Maintain and optimize inference systems – Debug complex model serving issues, optimize latency performance, monitor system health, and ensure 99.9% uptime for AI-powered features.
Implement model optimizations – Work with neural network architectures including diffusion networks, VAEs, and transformers. Apply streaming optimizations and understand video model architectures to implement effective performance improvements.
Manage inference infrastructure – Leverage containerization with Docker, cloud storage solutions like S3, and cluster computing to build scalable model serving infrastructure.
Collaborate with research teams – Work closely with AI researchers to understand model requirements, architectural constraints, and optimization opportunities for new video generation models.

A bit about you:

2+ years of ML engineering experience with focus on model inference and deployment
Strong understanding of neural network architectures, particularly diffusion networks, VAEs, and transformer models
Experience with video and image models – Understanding of how video/image generation models work, their architectures, and optimization strategies specific to video processing
Multi-GPU inference expertise – Experience running model components across multiple GPUs, implementing parallel processing strategies for large models
Production model hosting experience – Track record of deploying and maintaining ML models in production environments, including streaming and real-time inference
Experience with containerization (Docker), AWS, and cluster computing environments
Familiarity with machine learning frameworks (PyTorch, TensorFlow)
Experience with inference platforms and model serving solutions

Technical Stack You'll Work With:

Cloud: AWS (S3, DynamoDB), Kubernetes clusters
ML Infrastructure: Model serving platforms, Docker
Languages: Python
Frameworks: PyTorch, TensorFlow
Models: Video generation models, diffusion networks, VAEs, transformers

Optimization: Multi-GPU inference, real-time processing techniques

Pay Equity:

In compliance with Pay Transparency Laws, the base salary range for this role is between $175,000-$225,000 for those located in the San Francisco Bay Area, New York City and Seattle, WA. When determining compensation, a number of factors will be considered, including skills, experience, job scope, location, and competitive compensation market data.

Benefits:

Health Care — 99% of premiums for medical, vision, dental are fully paid for by Cantina, plus One Medical membership.
Monthly Wellness Stipend — $500/month to use on whatever you’d like!
Rest and Recharge — 15 PTO days per year, 10 sick days, all Federal holidays, and 2 floating holidays.
401(K) — Eligible to participate on day one of employment.
Parental Leave & Fertility Support
Competitive Salary & Equity
Lunch and snacks provided for in-office employees.
WFH equipment provided for full-time hybrid/remote employees.

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.