The Lead SRE will ensure the reliability and scalability of the data platform, manage infrastructure, mentor engineers, and define operational excellence standards. Responsibilities include incident management, cloud services management, and promoting SRE best practices.
Pioneer of online flash sales since 2001 and key player in European e-commerce, Veepee collaborates with over 7,000 brands to offer highly discounted products available for a limited time. Operating across various sectors, including fashion, home, wine, travel or beauty... Veepee achieved a turnover of 3.3 billion euros incl. VAT in 2024 and employs 5,000 staff members across 10 countries.
📄 JOB DESCRIPTION
Today we're looking for a site reliability engineer - full-time - to join our data department, and more specifically the data platform team. The candidate should expect to work in a distributed environment, with team members in France, main office in Paris, and in Belgium.
Veepee’s data organization came into existence in 2018 and consists of a strong team of 50 data professionals, spread across different data domains (engineering, analytics, data science & ML and governance).
You will be part of a multidisciplinary, multinational team that fosters collaboration, transparency, and respect.
Within the data platform team, you’ll drive the reliability and scalability of Veepee’s next-generation data platform—powering data ingestion, analytics, and ML workloads across multiple European datacenters.
You’ll act as the SRE reference for the Data Platform, helping define operational excellence standards and mentoring engineers across teams.
🎯 TASKS
- Infrastructure & Reliability
- Maintain and monitor Kubernetes microservices.
- Define observability standards (logging, metrics, alerting) using Grafana, Prometheus, etc.
- Manage GCP services (BigQuery, Cloud Storage, Cloud SQL…) with Terraform and Atlantis.
- Enhance GitOps deployments (Helm, ArgoCD).
- Incident management & on-call rotationPerformance and cost optimization.
- Security and compliance alignment (especially with multi-region GCP/on-prem setup).
- Collaboration & Enablement
- Partner with data engineers/scientists to build resilient ingestion pipelines.
- Support data scientists to deploy and monitor ML workflows.
- Promote SRE best practices (SLOs, DRP, postmortems, capacity planning).
👉 Required skills & experience
- Leadership & Collaboration
- Proven ability to lead technical discussions and influence reliability culture across multiple teams.
- Strong sense of ownership and accountability, with a collaborative mindset.
- Excellent communication skills; able to explain complex topics clearly to both technical and non-technical audiences.
- Fluent in English (spoken and written).
- Experience
- 5+ years of experience as an SRE, DevOps, or Platform Engineer in production environments.
- Demonstrated experience deploying and operating applications on Kubernetes (Helm, GitOps, CI/CD).
- Solid understanding of public cloud (preferably Google Cloud Platform) and private cloud ecosystems.
- Hands-on experience implementing Infrastructure as Code with Terraform and GitLab pipelines.
- Proven ability to build and maintain observability stacks (Grafana, Prometheus, Stackdriver, or equivalent).
- Familiarity with GitOps workflows and modern deployment practices (e.g., ArgoCD).
- Mindset
- You thrive in helping others and enabling teams to be more autonomous.
- You’re pragmatic, solutions-oriented, and willing to go the extra mile to keep systems reliable.
- You enjoy working in an environment that values responsibility, trust, and continuous improvement.
👉 NICE TO HAVE skills
- Hands-on experience with Trino, Airflow, or data-intensive workloads.
- Knowledge of Iceberg, ClickHouse, or data lake architectures.
- Experience automating infrastructure using Python/Go.
- Experience with GitOps approach using ArgoCDSome programming skills in Python (for automation/enablement tools) and Java (to understand context)
- Experience with Machine Learning applicationsExperience with ELT solutions
✅ BENEFITS
- Variable bonus
- Dynamic and creative environment within international teams
- The variety of self-education courses on our e-learning platform
- The participation in meetups and conferences locally and internationally
- Flexible Office with up to 2 days at home
- Flexible retribution package (including Medical Insurance)
⚙️RECRUITMENT PROCESS
1️⃣ 30-minute HR Screen with a Veepeeᵀᵉᶜʰ Recruiter
2️⃣ Technical exchange
3️⃣ Team Interview
We are convinced that it is up to you to define the way you work, to develop yourself, and to progress.
At Veepee we guarantee that you can just be yourself! For the service of diversity and inclusion, Veepee is committed to reviewing all applications received on an equal basis.
🔗COMPANY For more information about our ecosystem: https://careers.veepee.com/en/home-page-en/
❓ WHO WE ARE
Veepeeᵀᵉᶜʰ is a tech community of 500 collaborators who play a key role in Veepee’s innovative strategy.
From Warsaw to Sevilla, through Brussels, Amsterdam, Paris, Lyon, Nantes, Nice, Barcelona, Madrid and Lausanne all our projects are developed in an agile environment with a wide skills variety where you’ll be sure to find your place, no matter the technology you work with.
If you love to try things why don’t you jump on this new adventure?
Need more info > https://careers.veepee.com/en/
Vente-privee.com processes the collected data to handle the recruitment process, and to evaluate your ability to carry out the job offered and your professional skills. You can learn more about our use of your data and your rights by reading our recruiting privacy policy.
Top Skills
Argocd
Gitops
Google Cloud Platform
Grafana
Helm
Java
Kubernetes
Prometheus
Python
Terraform
Similar Jobs
eCommerce • Software
As a Lead SRE, you will manage a team of SREs, implement deployment tools, automate infrastructure, maintain operational services, and promote DevOps practices.
Top Skills:
AnsibleConsulDockerElasticsearchElkGitlab Ci/CdKubernetesLinuxMongoDBPostgresPrometheusRabbitMQTerraformVault
Artificial Intelligence
The Lead Site Reliability Engineer will oversee infrastructure reliability, manage team operations, and drive continuous improvements in cloud platforms and automation processes.
Top Skills:
BashDatadogDockerElk StackFluxGoGrafanaKubernetesPrometheusPythonTerraform
Fintech • Software • Financial Services
As a Junior Site Reliability Engineer, you will enhance platform reliability by developing solutions, reducing toil, and ensuring system availability alongside collaboration with other engineering teams.
Top Skills:
Argo RolloutsArgocdAWSDynamoDBElasticsearchGCPGithub ActionsGitlab CiGoKafkaKarpenterKubernetesLokiOpentelemetryOpsgeniePostgresPrometheusRedisSqsTerraformThanos
What you need to know about the London Tech Scene
London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

