Deepmind Logo

Deepmind

Research Scientist, Web Data

Posted 10 Days Ago
Be an Early Applicant
Easy Apply
In-Office
London, Greater London, England
Mid level
Easy Apply
In-Office
London, Greater London, England
Mid level
Lead improvements in web data pipelines by investigating performance issues, developing metrics, and collaborating with teams to enhance data quality.
The summary above was generated by AI

At Google DeepMind, we value diversity of experience, knowledge, backgrounds and perspectives and harness these qualities to create extraordinary impact. We are committed to equal employment opportunity regardless of sex, race, religion or belief, ethnic or national origin, disability, age, citizenship, marital, domestic or civil partnership status, sexual orientation, gender identity, pregnancy, or related condition (including breastfeeding) or any other basis as protected by applicable law. If you have a disability or additional need that requires accommodation, please do not hesitate to let us know.


Snapshot

Artificial Intelligence could be one of humanity’s most useful inventions. At Google DeepMind, we’re a team of scientists, engineers, machine learning experts and more, working together to advance the state of the art in artificial intelligence. We use our technologies for widespread public benefit and scientific discovery, and collaborate with others on critical challenges, ensuring safety and ethics are the highest priority


The Role

Your job will be to own and lead improvements of meaningful chunks of the web data pipeline. Examples of such chunks are scraping (i.e., transforming raw HTML into clean text data to be used during training, potentially including relevant image data), data filtering (removing/down-weighting low-quality content) or adding new data sources (such as historical web crawl data).

Key responsibilities:
  • Investigating current results to identify areas for improvement (e.g., based on user feedback or weak eval performance).
  • Developing measurements of weakness, either as model eval or data pipeline statistics, to help drive progress.
  • Setting out a medium-term agenda to improve the data pipeline, with feedback from peers and key stakeholders, and convincing others to join your efforts.
  • Working with partner teams in GDM (and wider Google) to leverage existing solutions effectively and communicate necessary infrastructure improvements.
  • Day-to-day execution by coding, running experiments and reviewing contributions.
About You

In order to set you up for success as a Research Scientist at Google DeepMind,  we look for the following skills and experience:

  • 3 years of experience working as a self-directed engineer or researcher, e.g., as senior software developer or graduate student.
  • Developing large-scale data (>=100M examples) processing pipelines in Python and/or C++.
  • Evaluating and investigating (pretrained) LLM performance.

In addition, the following would be an advantage: 

  • Filtering data based on heuristic and/or learned signals.
  • Working with web data for LLM training, such as cleaning data, removing duplicates, identifying most valuable examples, etc.
  • Developing advanced LLM metrics (e.g., execution-based, using auto-raters, etc.)
 

Top Skills

Python,C++

Similar Jobs

2 Hours Ago
Remote or Hybrid
4 Locations
Expert/Leader
Expert/Leader
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The Director of Threat Research & Reporting will lead a cross-functional team focused on assessing and improving CrowdStrike's threat detection capabilities while managing internal and external research relationships.
Top Skills: C2 FrameworksCybersecurityData ScienceDetection EngineeringInformation SecurityMachine LearningMitre Att&CkSecurity Assessment ToolsetsThreat Research
2 Hours Ago
Remote or Hybrid
United Kingdom
Junior
Junior
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The Regional Sales Manager will drive sales opportunities with strategic clients, maintain existing client relationships, and collaborate internally to succeed in sales proposals.
Top Skills: Ai-Native PlatformCybersecurityHigh Technology Products
2 Hours Ago
Remote or Hybrid
5 Locations
Junior
Junior
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
The role involves analyzing DPRK cyber operations, producing intelligence reports, tracking adversary activities, and conducting briefings for customers.
Top Skills: Analytical TradecraftCyber IntelligenceResearch/Collection ToolsThreat Intelligence Research

What you need to know about the London Tech Scene

London isn't just a hub for established businesses; it's also a nursery for innovation. Boasting one of the most recognized fintech ecosystems in Europe, attracting billions in investments each year, London's success has made it a go-to destination for startups looking to make their mark. Top U.K. companies like Hoptin, Moneybox and Marshmallow have already made the city their base — yet fintech is just the beginning. From healthtech to renewable energy to cybersecurity and beyond, the city's startups are breaking new ground across a range of industries.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account