Research at ERA

Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards Into Open-Weight LLMs

Kyle O’Brien, ERA Fellow 2025

Full paper
Mapping IAEA Verification Tools to International AI Governance: A Mechanism-by-Mechanism Analysis

Christina Krawec, ERA Fellow 2025

Full paper
Towards Reliable Evaluation of Behavior Steering Interventions in LLMs

Itamar Pres, ERA Fellow 2024

Full paper
The Case for Model Access Governance

Edward Kembery, ERA Fellow 2024

Full paper
AI Safety Frameworks Should Include Procedure for Model Access Decisions

Edward Kembery & Tom Reed, ERA Fellows 2024

Full paper
Verification methods for international AI agreements

Tom Reed & Jack William Miller
ERA Fellows 2024

Full paper
Towards Safe Multilingual Frontier AI

Arturs Kanepajs & Vladimir Ivanov
ERA Fellows 2024

Full paper
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks

Nathalie Maria Kirch & Severin Field, ERA Fellows 2024

Full Paper
Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment

Allison Huang, ERA Fellow 2024

Full Paper
Towards a UN Role in Governing Foundation Artificial Intelligence Models

Claire Dennis, ERA Fellow 2023

Full paper
Welfare Diplomacy: Benchmarking Language Model Cooperation

Gabriel Mukobi, ERA Fellow 2023

Full paper

Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards Into Open-Weight LLMs