Atla Jobs

Research Intern

Atla

Research Intern

Reposted 9 Days Ago

Be an Early Applicant

In-Office

London, Greater London, England, GBR

Internship

In-Office

London, Greater London, England, GBR

Internship

As a Research Intern at Atla, you'll conduct machine learning research, contribute to product development, and output publications and datasets while collaborating with engineers on projects involving iterative model improvement and evaluation systems.

The summary above was generated by AI

About Atla

Atla is committed to engineering safe, beneficial AI systems that will have a massive positive impact on the future of humanity. We are a London-based start-up building the most capable AI evaluation models. Become part of our growing world-class team, backed by Y Combinator, Creandum, and the founders of Reddit, Cruise, Rappi, Instacart and more.

Role

As Atla’s research intern, you will collaborate with our researchers and obtain deep experience in a growing AI startup. As part of your role, you will:

Conduct cutting-edge machine learning research, contributing to research initiatives that have practical applications in our product development.
Disseminate your research results through the production of publications, datasets, and code.

Our ongoing research projects encompass but are not limited to:

Iterative Self Improvement

This project applies iterative self-improvement to enhance our general-purpose evaluator. This involves using the model’s outputs to refine its training data iteratively, rather than relying on fixed datasets. Prior work [1, 2, 3, 4] demonstrates the effectiveness of this approach, and we aim to extend it to evaluation systems.

We will leverage our internal training data, infrastructure, and benchmarks to iteratively refine the evaluator. You will collaborate with engineers to build infrastructure for iteratively generating better and more informative data. Techniques from our research on techmulti-stage synthetic data generation will be incorporated to improve data quality.

Key challenges include addressing bias amplification, semantic drift, and maintaining diversity of data to ensure model stability and alignment. This project aims to advance safe iterative training methodologies and deliver a more capable evaluator, with findings targeted for a top-tier conference. The scope can be tailored to your skills and interests.

[1] Wang, Y., et al. (2023). SELF-INSTRUCT: Aligning Language Models with Self-Generated Instructions.

[2] Yuan, W., et al. (2024). Self-Rewarding Language Models.

[3] Wang, T., et al. (2024). Self-Taught Evaluators.

[4] Li, X., et al. (2024). MONTESSORI-INSTRUCT: Generate Influential Training Data Tailored for Student Learning.

Inference Time Compute

This project explores inference-time compute scaling to enhance our general-purpose evaluator, particularly for complex tasks like coding, which benefit from longer reasoning chains. Recent research [1, 2] has shown the effectiveness of inference-time compute in improving performance on reasoning and mathematical tasks by leveraging more tokens during inference.

We will investigate methods to train models capable of utilising additional tokens effectively for reasoning. This involves experimenting with reinforcement learning (RL) approaches, such as group reinforcement policy optimisation (GRPO), to encourage self-verification and reasoning strategies. You will work with engineers to develop the necessary training infrastructure.

Key challenges include addressing trade-offs between token efficiency and performance while mitigating common issues. The project aims to develop robust methods for inference-time compute scaling and contribute findings to a top-tier conference. The scope can be tailored to your skills and interests.

[1] Guo, D., et al. (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.

[2] Snell, C., et al. (2024). Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters.

Agentic Evaluation

This project investigates how to evaluate agentic systems using an LLM-as-a-Judge framework. Agents introduce new challenges due to their ability to reason, plan, and interact with external tools [1,2]. Evaluating their capabilities and safety requires new approaches, with potential directions including:

Agent-as-a-Judge: Using agentic systems to evaluate other agentic systems, reducing reliance on human judgment and enabling automated, scalable evaluation frameworks [3].
Task-driven and multi-step evaluation: Moving beyond single-action accuracy to assess long-horizon reasoning, adaptability, and decision-making in dynamic environments [4].

AI agents are becoming the next major AI paradigm, with 2025 set to be a pivotal year for their development. As models evolve from passive assistants to autonomous agents, rigorous evaluation is essential to ensure their reliability and safety [5,6].

This project aims to develop a framework for evaluating agents, create benchmarks, and contribute findings to a top-tier conference. The scope can be tailored to your skills and interests.

[1] Yao, S., et al. (2023). ReAct: Synergizing Reasoning and Acting in Language Models.

[2] Deng, X., et al. (2023). MIND2WEB: Towards a Generalist Agent for the Web.

[3] Zhuge, M., et al. (2024). Agent-as-a-Judge: Evaluate Agents with Agents.

[4] Nathani, D., et al. (2025). MLGym: A New Framework and Benchmark for Advancing AI Research Agents.

[5] Altman, S. (2024). The Intelligence Age.

[6] Heikkilä, M., & Heaven, W. D. (2025). Anthropic’s Chief Scientist on 4 Ways Agents Will Be Even Better in 2025. MIT Technology Review.

Qualifications

Evidence of exceptional research engineering ability:

Are currently pursuing, or in the process of obtaining, a PhD in Machine Learning, NLP, Artificial Intelligence, or a related discipline. We will also consider exceptional non-PhD candidates.
Proven track record in empirical research, including designing and executing experiments, and effectively writing up and communicating findings.
Publications in top AI conferences.
Aptitude for distilling and applying ideas from complex research papers.

Nice to have

Previous internship experience at elite AI research labs (OpenAI, DeepMind, Meta, Anthropic, etc.).
Experience using large-scale distributed training strategies, data annotation and evaluation pipelines, or implementing state of the art ML models.
Interested in and thoughtful about the impacts of AI technology.

About you

You'll work by and thrive through our core principles:

Own the Outcome

Create real value: Every action should deliver tangible, meaningful value for the people who use what we build.
Drive to completion: Do the second 90%.
Do fewer things, better: Prioritize focus over breadth.

Back the Team

Collaborate for excellence: The whole is greater than the sum of its parts.
Seek truth: Let the best ideas win, no matter where they come from, and let go of ego.
Argue passionately, then commit fully: Debate fiercely, but once a decision is made, own it like it’s yours.

Drive the Mission

Advance AI safety: Every action should contribute towards the safe development of AI.
Go big or go home: “The people who are crazy enough to think they can change the world are the ones who do.”

Compensation

Highly competitive

London, England, United Kingdom

Similar Jobs

Outpost (outpostnow.com)

Technical Marketing Intern (Analytics, Research & Automation)

4 Days Ago

In-Office

London, Greater London, England, GBR

Internship

Consumer Web • eCommerce • Retail

Build marketing automation, analytics and content systems: consolidate data into dashboards, automate sales materials and content workflows, implement SEO/GEO and lead enrichment, monitor signals, and connect leads to a lightweight CRM. Ship repeatable, automated solutions using code, APIs or no-code tools.

Top Skills: Analytics ApisBi/Dashboard ToolClaudeCRMFigmaJavaScriptLinkedin ApiMakeN8NNo-CodePythonWeb ScrapingZapier

CrowdStrike

Intelligence Intern - Applied Research Cell (Remote)

10 Hours Ago

Remote or Hybrid

United Kingdom

Internship

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

Assist with cyber threat research, malware analysis, and tool development in a collaborative remote internship. Complete a project in 12 weeks.

Top Skills: CPython

The Brattle Group

2026 Research Analyst Intern - Economics & Finance

18 Days Ago

In-Office

London, Greater London, England, GBR

Internship

Business Intelligence • Consulting

The Research Analyst Intern will assist in quantitative and qualitative research, economic modelling, and data analysis for arbitration and competition cases, contributing to reports and presentations.

Top Skills: ExcelPythonRStata

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Atla

Research Intern

Atla London, England Office

Similar Jobs

Technical Marketing Intern (Analytics, Research & Automation)

Intelligence Intern - Applied Research Cell (Remote)

2026 Research Analyst Intern - Economics & Finance

What you need to know about the London Tech Scene