JetBrains Logo

JetBrains

Senior Research Engineer (Code World Models)

Posted 3 Days Ago
Be an Early Applicant
In-Office
London, Greater London, England, GBR
Senior level
In-Office
London, Greater London, England, GBR
Senior level
Lead pre-training and mid-training experiments for code-centric foundation models. Build large-scale data pipelines, handle code corpora and execution-based data, develop repository-level evaluations, and collaborate with researchers and engineers to improve model understanding of software systems.
The summary above was generated by AI

JetBrains is a global software company that creates intelligent tools for software developers and teams. Since 2000, we have built products that help developers work more productively, write higher-quality code, and stay focused on solving real problems.

The JetBrains Research team is looking for a Senior Research Engineer to work on Code World Models: models that learn how software systems behave, change, execute, and interact with developer tools.

This role is focused on model pre-training and mid-training for code-centric foundation models. You will work on data, training pipelines, evaluation, and experiments that improve how models understand programs, repositories, execution, tests, and software engineering workflows.

In this role, you will:
  • Design and run pre-training, continued pre-training, and mid-training experiments for code models.
  • Build and improve data pipelines for large-scale model training, including filtering, deduplication, mixture design, and dataset quality checks.
  • Work with code corpora, repositories, tests, execution traces, and synthetic data.
  • Develop evaluations for complex repository-level code reasoning tasks.
  • Collaborate with researchers and engineers working on ML for code and AI developer tools.
We’ll be happy to have you on our team if you:
  • Have hands-on experience with model pre-training, continued training, or mid-training.
  • Have strong engineering skills in Python and experience with modern ML frameworks.
  • Understand large-scale ML training workflows, including data processing, distributed training, checkpointing, evaluation, experiment tracking, and debugging.
  • Have experience working with large datasets and care about data quality, contamination, sampling, and reproducibility.
  • Have a background in NLP, ML for software engineering, or a similar domain.
  • Enjoy working on research problems with high uncertainty and turning ideas into working experiments.
It would be a plus if you:
  • Have experience training or adapting models for code generation, code understanding, software agents, program repair, test generation, or repository-level reasoning.
  • Have worked with execution-based data, such as unit tests, traces, logs, compiler feedback, runtime states, or sandboxed code execution.
  • Have experience with large-scale distributed training of models with 70B+ parameters.
  • Understand evaluation challenges for code models, including benchmark contamination, flaky tests, execution-based scoring, and long-horizon task evaluation.
  • Have contributed to ML infrastructure, open-source projects, or research systems.
#LI-KP1

We are an equal opportunity employer
We know great ideas can come from anyone, anywhere. That’s why we do our best to create an open and inclusive workplace – one that welcomes everyone regardless of their background, identity, religion, age, accessibility needs, or orientation.

We process the data provided in your job application in accordance with the Recruitment Privacy Policy.

Similar Jobs

13 Hours Ago
In-Office or Remote
Senior level
Senior level
Artificial Intelligence • Cybersecurity
As a Senior SRE, ensure reliability and performance of cloud infrastructure, manage incident response, implement monitoring, and drive continuous improvements.
Top Skills: ArgocdAws EksElk StackGithub ActionsGrafanaKubernetesOpsgeniePagerdutyPrometheusTerraform
Junior
Fintech • Payments • Financial Services
As a Field Sales Agent, you will actively engage with potential customers in the field, primarily restaurant and shop owners, demonstrating how our solutions can benefit their business. The role involves building relationships, providing ongoing support, and collaborating with a team to improve sales strategies.
Yesterday
In-Office or Remote
United Kingdom
Mid level
Mid level
Artificial Intelligence • Machine Learning • Natural Language Processing • Software • Conversational AI
The role involves researching and developing large language models (LLMs) with a focus on transformer architecture, data curation, distributed training, and optimization. Responsibilities include conducting experiments, collaborating with teams, and staying updated on deep learning advancements.
Top Skills: Distributed ComputingLarge Language ModelsPythonPyTorchTransformer Architectures

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account