Job Description
We are seeking a Network Engineer to design, implement, and manage high-performance networks for HPC and AI infrastructure.
Candidates will work on cutting-edge technologies, including InfiniBand, optical networking, and advanced Linux-based systems, contributing to scalable, secure, and high-availability network solutions. They should also have expertise in IP routing protocols (BGP, OSPF) and network automation (Ansible, Nornir & Netmiko).
YOUR RESPONSIBILITIES:
- Monitor the performance and health of InfiniBand fabrics, including switches, host adapters, and nodes, using existing tools and developing new monitoring solutions where necessary.
- Diagnose and resolve network connectivity issues, performance bottlenecks, and component failures.
- Collaborate with cross-functional teams to support HPC clusters and ensure smooth network operation.
- Assist with the deployment and configuration of network infrastructures, including large-scale fabric installations from initial setup to operational readiness.
- Maintain and update network documentation and workflows to align with organizational standards.
- Architect, deploy, and optimize advanced network solutions for high-throughput, low-latency environments.
- Lead large-scale deployments and guide cross-functional teams during implementation.
- Develop and implement advanced monitoring tools and strategies for network performance.
- Evaluate and integrate emerging networking technologies to improve scalability and security.
- Mentor junior and intermediate engineers, providing technical leadership.
- Design and implement scalable network architectures with a focus on BGP and OSPF routing protocols.
- Lead network optimization initiatives, ensuring maximum efficiency, security, and performance.
- Oversee the integration of new network technologies into existing infrastructure.
- Lead troubleshooting efforts for complex, high-impact network issues across multiple sites.
- Provide expert guidance on BGP, OSPF, and other advanced routing protocols for large-scale networks.
- Take the lead on critical network incidents, working with other teams to resolve issues quickly.
- Architect and implement advanced network automation solutions using Ansible, Nornir, and Netmiko.
- Develop custom network automation workflows that integrate with other systems in the organization.
- Drive automation initiatives to ensure network changes are repeatable, efficient, and error-free.
Linux System & Network Security:
- Take ownership of network-related Linux administration, ensuring high availability, security, and performance.
- Implement and enforce network security measures, including firewalls, VPNs, and access control policies.
- Mentor junior engineers on best practices for Linux networking and security.
- HPC & InfiniBand Expertise:
- Lead the design and implementation of HPC network architectures, focusing on InfiniBand configurations for performance-critical environments.
- Ensure the integration and management of InfiniBand for high-throughput, low-latency computing systems.
- Provide technical leadership on HPC interconnect issues, optimizing performance across large clusters.
YOUR QUALIFICATIONS:
- Knowledge of InfiniBand configuration and management.
- Familiarity with optical networking hardware and Linux system administration.
- Proficiency in at least one scripting language (e.g., Python, Bash).
- Strong analytical and troubleshooting skills.
- Ability to collaborate effectively in team environments.
- Willingness to travel to data centers for deployments and support.
7+ years of experience in network engineering, with a focus on high-performance environments. - Expertise in InfiniBand, RDMA, and advanced network architectures.
- Advanced certifications (e.g., CCIE, NVIDIA DPU Certification) preferred.
With us, you will work towards the future of HPC: From new, sustainable building methods for data centers to cooling concepts to software solutions for accelerated compute.
Your approaches count: In official exchange formats or spontaneously at the coffee machine. At Northern Data, it's the best idea that counts - not the hierarchy. We’re looking forward to getting your inputs!
You make the difference in the company: Unlike in established corporations, at Northern Data you will really help shape things. From implementing new departments, to optimizing processes and culture.
Best-in-class partners: The best work with Northern Data. This means a knowledge and time advantage from which your career and our customers benefit equally.
Green by heart: Sustainability is at the core of Northern Data. With us, you actively work on the carbon neutrality of datacenters worldwide. Beginning with our infrastructure and continuing with the solutions for our clients, we work towards a green future.
Home Office facts: Work with our international and virtual team flexible from home. And of course, your hardware wishes will be fulfilled to make your ideas for next level HPC come true.
Your wellness matters: At Northern Data we have regular wellbeing initiatives that are designed to promote wellness, diversity, inclusion, and much more, ensuring a supportive and enriching environment for our global team.