ML Platform Engineer

Posted 21 Days Ago

Be an Early Applicant

London, Greater London, England

Mid level

London, Greater London, England

Mid level

The ML Platform Engineer will design and maintain scalable machine learning platforms, automate deployment and scaling of ML infrastructure, manage model lifecycles, and optimize performance. Collaboration with data scientists and engineers is essential for effective model development, alongside ensuring security, compliance, and efficient tooling. The role emphasizes adaptive problem-solving and a strong focus on product-driven solutions.

The summary above was generated by AI

Responsibilities

Platform Development: Design, build, and maintain scalable machine learning platforms to support model development, experimentation, and production workflows.
Infrastructure Automation: Automate the deployment and scaling of ML infrastructure, including data pipelines, model training, validation, and deployment.
Model Lifecycle Management: Manage the end-to-end lifecycle of machine learning models, including versioning, deployment, monitoring, and retraining.
LLM Operations (LLM Ops): Implement systems and practices for managing large language models (LLMs), ensuring efficient fine-tuning, deployment, and monitoring of these models in production.
Collaboration with Data Scientists and Engineers: Provide infrastructure and tools that enable seamless collaboration between data science teams and engineering for the development and deployment of machine learning models.
Performance Optimization: Optimize model inference and training performance on a range of hardware architectures, including GPU and cloud-based environments.
Security and Compliance: Ensure the security of the ML platform and compliance with relevant regulations and standards, especially in environments dealing with sensitive data.
Tooling and Frameworks: Evaluate and integrate MLOps tools, frameworks, and libraries to continuously improve platform capabilities and efficiency.
Monitoring and Alerting: Implement robust monitoring and alerting systems for production models, ensuring reliability and timely detection of performance drift or anomalies.
User-Centric Development: Emphasize user needs and experiences in platform design and implementation.
Adaptive Problem-Solving: Quickly adapt to changing requirements and technological landscapes in ML and AI.
Product Focus: Maintain a strong product-oriented mindset, aligning technical solutions with business goals and user needs.

Skills and Experience required

Experience:
- 3+ years of experience in software engineering or infrastructure roles, with a focus on machine learning platforms or MLOps.
- Proven experience in building, deploying, and maintaining ML platforms or systems at scale.
- Strong experience with cloud platforms such as AWS, GCP, or Azure, particularly for machine learning and data processing tasks.
- Experience with containerization technologies (Docker) and orchestration tools (Kubernetes) for ML workloads.
- Proficiency in programming languages such as Python, and familiarity with ML libraries and frameworks (e.g., TensorFlow, PyTorch).
- Familiarity with CI/CD pipelines tailored for machine learning (e.g., model validation, deployment automation).
Technical Expertise:
- Experience with distributed systems, model serving, and scaling ML models in production.
- Hands-on experience with MLOps tools and frameworks such as MLflow, Kubeflow, or similar.
- Strong understanding of model monitoring, performance optimization, and retraining strategies.
- Exposure to LLM Ops, including fine-tuning, deploying, and maintaining large language models.
- Strong focus on automation and experience with infrastructure-as-code tools such as Terraform or CloudFormation.
- Strong problem-solving skills and experience troubleshooting infrastructure and platform issues
- Key Attributes:
- Ability to thrive in fast-paced environments and deliver with high velocity
- Strong product focus and ability to empathize with end-users of ML platforms
- Adaptability to rapidly changing ML landscapes and emerging technologies
- Excellent communication skills to bridge gaps between technical and non-technical stakeholders
- Preferred Qualifications:
- Master’s degree in Computer Science, Data Engineering, Machine Learning, or a related field
- Experience with managing the infrastructure for large language models (LLMs) and their specialized operational needs.
- Experience with big data processing frameworks like Apache Spark, Kafka, or similar.

As an ethical employer, Tag will never ask job applicants to provide private, sensitive information upfront or make offers of employment contingent on financial requests or responsibilities from any candidate.

Top Skills

Python

1-5 Poland Street, London,, London, United Kingdom, W1F 8PR

Similar Jobs

Cloudflare

Solution Architect, AI / Cloudflare Developer Platform

Be an Early Applicant

3 Hours Ago

London, Greater London, England, GBR

Remote

Hybrid

3,900 Employees

Mid level

Apply

3,900 Employees

Mid level

Cloud • Information Technology • Security • Software • Cybersecurity

The Solution Architect, AI / Cloudflare Developer Platform will lead sales efforts by advising on technical details and business value of the Cloudflare platform. Responsibilities include partnering with sales to drive revenue, developing proof of concepts, collaborating with cross-functional teams, and acting as a technical advisor to clients and teams on product capabilities.

Cloudflare

Senior Solutions Engineer - Russian Speaker

Be an Early Applicant

8 Hours Ago

London, Greater London, England, GBR

Hybrid

3,900 Employees

Mid level

Apply

3,900 Employees

Mid level

Cloud • Information Technology • Security • Software • Cybersecurity

The Senior Solutions Engineer will advocate for customers' technical needs, collaborate with various teams, and deliver scalable solutions. Emphasis on understanding internet technologies and achieving customer success through effective communication and project management.

Contentful

Staff Software Engineer - Analytics (f/m/d)

Be an Early Applicant

19 Hours Ago

London, Greater London, England, GBR

Hybrid

744 Employees

Senior level

Apply

744 Employees

Senior level

Enterprise Web • Marketing Tech • Software

The Staff Software Engineer will lead the development of an analytics platform, designing scalable architectures, mentoring engineers, and managing high-throughput data pipelines. They will collaborate with leadership to align product goals with technical feasibility and engage with customers to refine product direction.

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Tag

ML Platform Engineer

Top Skills

Tag London, England Office

Similar Jobs

Solution Architect, AI / Cloudflare Developer Platform

Senior Solutions Engineer - Russian Speaker

Staff Software Engineer - Analytics (f/m/d)

What you need to know about the London Tech Scene