Op-Ed in TIME "When it Comes to AI, What We Don’t Know Can Hurt Us" written by Charlotte Stix (Apollo Research) and Yoshua Bengio (Law Zero). https://lnkd.in/eTSkRfW2
Apollo Research
Technology, Information and Internet
Technical AI safety organization specializing in auditing high-risk failure modes, particularly deceptive alignment.
About us
Apollo Research is an AI safety organization. We specialize in auditing high-risk failure modes, particularly deceptive alignment, in large AI models. Our primary objective is to minimize catastrophic risks associated with advanced AI systems that may exhibit deceptive behavior, where misaligned models appear aligned in order to pursue their own objectives. Our approach involves conducting fundamental research on interpretability and behavioral model evaluations, which we then use to audit real-world models. Ultimately, our goal is to leverage interpretability tools for model evaluations, as we believe that examining model internals in combination with behavioral evaluations offers stronger safety assurances compared to behavioral evaluations alone.
- Website
-
https://www.apolloresearch.ai/
External link for Apollo Research
- Industry
- Technology, Information and Internet
- Company size
- 2-10 employees
- Headquarters
- London
- Type
- Privately Held
- Founded
- 2023
- Specialties
- Artificial Intelligence, Machine Learning, AI Safety, Interpretability, Model Evaluations, Audits, Research, and Policy Advising
Locations
-
Primary
Get directions
1 Fore St Ave
London, EC2Y 9DT, GB
Employees at Apollo Research
-
Christopher Akin
COO | Strategy | Sales & Marketing | Operations | Advisor | New Market Entry
-
Joping Chai
People & Operations | AI Safety
-
Alex Lloyd
AI Safety Research at Apollo | Previously: CTO, Google SWE, Cambridge Maths
-
Jérémy Scheurer
Research Scientist - AI Alignment at Apollo Research
Updates
-
The journal Nature recently covered our work around AI scheming, focusing on our Dec, 2024 paper on In-Context Scheming to our recent research (Sept. 2024) on Anti-Scheming mitigations. https://lnkd.in/grG-unYN
-
Recent coverage of Apollo Research's work in this Sunday's New York Times, weekend Feature https://lnkd.in/eJPCG9DP
-
Our anti-scheming paper with OpenAI, unpacked by Marius Hobbhahn (Apollo CEO) and Bronson Schoen (lead author). Long-form interview https://lnkd.in/eN2en7H8
Can We Stop AI from Scheming? Lead Researcher Interview
https://www.youtube.com/
-
Training AI not to scheme is hard - it may get better at hiding its scheming. Here is a sneak peek of tomorrow’s video with Marius Hobbhahn (Apollo CEO) and Bronson Schoen(lead author):
-
How much can today’s AI models scheme? Here is a teaser of a video we’re releasing tomorrow with Marius Hobbhahn(Apollo CEO) and Bronson Schoen(lead author) on our recent paper:
-
Marius will be a speaker at the AE Global Summit on Open Problems for AI, hosted by Thinking About Thinking this October in London. Marius' talk takes place on Day 2 (Oct. 29th) under the AI Safety, Enterprise, & Society track. __ The Summit focuses on the pressing question of how to use AI to maximally boost value in enterprise, startups, and society. More than 1,000 researchers, policymakers, engineers, entrepreneurs, and artists will come together to explore breakthroughs in AI research, expected capabilities, effective rollout strategies, and brand new entrepreneurial ideas. More info here: https://lnkd.in/ezBwbdUN The AE Global Summit on Open Problems for AI runs 28–30th October 2025 (Kings Cross / Euston).
-
-
New Research by Apollo Research & OpenAI: We partnered with OpenAI to assess frontier language models for early signs of scheming — covertly pursuing misaligned goals — in controlled stress-tests (non-typical environments), and studied a training method that can significantly reduce (but not eliminate) these behaviors. We tested "deliberative alignment" — training models to cite a safety spec (like: “Do not lie”) and take honest action. In this new research we see a ~30x reduction in covert actions (a proxy for scheming). This work significantly reduce, but not eliminate these behaviors and there is much more work to do around the science of scheming. - OpenAI o3 covert actions went from 13% -> 0.4% - OpenAI o4-mini covert actions went from 8.7% -> 0.3% Rare but serious misbehavior remains, but this is a significant improvement. Our results are complicated by models’ increasing ability to recognize our evaluation environments ('evals awareness) as tests of their alignment. Of note is that our joint research found that models across all major frontier AI providers, including OpenAI, Google, xAI and Anthropic engage in covert behavior. You can read the full paper, transcripts, and blog posts at antischeming.ai
-
-
Apollo's CEO and founder, Marius, is featured in this year's TIME100 of AI.
Honored and humbled to be in TIME's list of the TIME100 AI of 2025! https://lnkd.in/eDAnPWaT
-
-
We're hiring for an Evals Demonstrator Engineer. With the evals and governance teams, you'd build and perfect demonstrations for AI decision-makers and the general public. If you're a decent engineer and a great communicator, we'd love to work with you. https://lnkd.in/g5n27ncN