Veepee

Lead SRE

Reposted 18 Days Ago

Be an Early Applicant

Hybrid

Paris, Île-de-France

Senior level

Hybrid

Paris, Île-de-France

Senior level

The Lead SRE will ensure the reliability and scalability of the data platform, manage infrastructure, mentor engineers, and define operational excellence standards. Responsibilities include incident management, cloud services management, and promoting SRE best practices.

The summary above was generated by AI

Pioneer of online flash sales since 2001 and key player in European e-commerce, Veepee collaborates with over 7,000 brands to offer highly discounted products available for a limited time. Operating across various sectors, including fashion, home, wine, travel or beauty... Veepee achieved a turnover of 3.3 billion euros incl. VAT in 2024 and employs 5,000 staff members across 10 countries.

📄 JOB DESCRIPTION

Today we're looking for a site reliability engineer - full-time - to join our data department, and more specifically the data platform team. The candidate should expect to work in a distributed environment, with team members in France, main office in Paris, and in Belgium.

Veepee’s data organization came into existence in 2018 and consists of a strong team of 50 data professionals, spread across different data domains (engineering, analytics, data science & ML and governance).

You will be part of a multidisciplinary, multinational team that fosters collaboration, transparency, and respect.

Within the data platform team, you’ll drive the reliability and scalability of Veepee’s next-generation data platform—powering data ingestion, analytics, and ML workloads across multiple European datacenters.

You’ll act as the SRE reference for the Data Platform, helping define operational excellence standards and mentoring engineers across teams.

🎯 TASKS

Infrastructure & Reliability
Maintain and monitor Kubernetes microservices.
Define observability standards (logging, metrics, alerting) using Grafana, Prometheus, etc.
Manage GCP services (BigQuery, Cloud Storage, Cloud SQL…) with Terraform and Atlantis.
Enhance GitOps deployments (Helm, ArgoCD).
Incident management & on-call rotationPerformance and cost optimization.
Security and compliance alignment (especially with multi-region GCP/on-prem setup).

Collaboration & Enablement
Partner with data engineers/scientists to build resilient ingestion pipelines.
Support data scientists to deploy and monitor ML workflows.
Promote SRE best practices (SLOs, DRP, postmortems, capacity planning).

👉 Required skills & experience

Leadership & Collaboration
Proven ability to lead technical discussions and influence reliability culture across multiple teams.
Strong sense of ownership and accountability, with a collaborative mindset.
Excellent communication skills; able to explain complex topics clearly to both technical and non-technical audiences.
Fluent in English (spoken and written).

Experience
5+ years of experience as an SRE, DevOps, or Platform Engineer in production environments.
Demonstrated experience deploying and operating applications on Kubernetes (Helm, GitOps, CI/CD).
Solid understanding of public cloud (preferably Google Cloud Platform) and private cloud ecosystems.
Hands-on experience implementing Infrastructure as Code with Terraform and GitLab pipelines.
Proven ability to build and maintain observability stacks (Grafana, Prometheus, Stackdriver, or equivalent).
Familiarity with GitOps workflows and modern deployment practices (e.g., ArgoCD).

Mindset
You thrive in helping others and enabling teams to be more autonomous.
You’re pragmatic, solutions-oriented, and willing to go the extra mile to keep systems reliable.
You enjoy working in an environment that values responsibility, trust, and continuous improvement.

👉 NICE TO HAVE skills

Hands-on experience with Trino, Airflow, or data-intensive workloads.
Knowledge of Iceberg, ClickHouse, or data lake architectures.
Experience automating infrastructure using Python/Go.
Experience with GitOps approach using ArgoCDSome programming skills in Python (for automation/enablement tools) and Java (to understand context)
Experience with Machine Learning applicationsExperience with ELT solutions

✅ BENEFITS

Variable bonus
Dynamic and creative environment within international teams
The variety of self-education courses on our e-learning platform
The participation in meetups and conferences locally and internationally
Flexible Office with up to 2 days at home
Flexible retribution package (including Medical Insurance)

⚙️RECRUITMENT PROCESS

1️⃣ 30-minute HR Screen with a Veepeeᵀᵉᶜʰ Recruiter

2️⃣ Technical exchange

3️⃣ Team Interview

We are convinced that it is up to you to define the way you work, to develop yourself, and to progress.

At Veepee we guarantee that you can just be yourself! For the service of diversity and inclusion, Veepee is committed to reviewing all applications received on an equal basis.

🔗COMPANY For more information about our ecosystem: https://careers.veepee.com/en/home-page-en/

❓ WHO WE ARE

Veepeeᵀᵉᶜʰ is a tech community of 500 collaborators who play a key role in Veepee’s innovative strategy.

From Warsaw to Sevilla, through Brussels, Amsterdam, Paris, Lyon, Nantes, Nice, Barcelona, Madrid and Lausanne all our projects are developed in an agile environment with a wide skills variety where you’ll be sure to find your place, no matter the technology you work with.

If you love to try things why don’t you jump on this new adventure?

Need more info > https://careers.veepee.com/en/

Vente-privee.com processes the collected data to handle the recruitment process, and to evaluate your ability to carry out the job offered and your professional skills. You can learn more about our use of your data and your rights by reading our recruiting privacy policy.

Top Skills

Argocd

Gitops

Google Cloud Platform

Grafana

Helm

Java

Kubernetes

Prometheus

Python

Terraform

Similar Jobs

Veepee

Site Reliability Engineer

15 Days Ago

Hybrid

Paris, Île-de-France, FRA

Mid level

eCommerce • Software

As a Lead SRE, you will manage a team of SREs, implement deployment tools, automate infrastructure, maintain operational services, and promote DevOps practices.

Top Skills: AnsibleConsulDockerElasticsearchElkGitlab Ci/CdKubernetesLinuxMongoDBPostgresPrometheusRabbitMQTerraformVault

Mistral AI

Site Reliability Engineer

23 Days Ago

In-Office

Paris, Île-de-France, FRA

Expert/Leader

Artificial Intelligence

The Lead Site Reliability Engineer will oversee infrastructure reliability, manage team operations, and drive continuous improvements in cloud platforms and automation processes.

Top Skills: BashDatadogDockerElk StackFluxGoGrafanaKubernetesPrometheusPythonTerraform

Qonto

Site Reliability Engineer

18 Days Ago

In-Office or Remote

Junior

Fintech • Software • Financial Services

As a Junior Site Reliability Engineer, you will enhance platform reliability by developing solutions, reducing toil, and ensuring system availability alongside collaboration with other engineering teams.

Top Skills: Argo RolloutsArgocdAWSDynamoDBElasticsearchGCPGithub ActionsGitlab CiGoKafkaKarpenterKubernetesLokiOpentelemetryOpsgeniePostgresPrometheusRedisSqsTerraformThanos

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.