Over the years, ERA Fellows have made significant contributions to the AI Safety & Governance research landscape. We highlight some of our fellows’ research below.
Research at ERA
-

Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards Into Open-Weight LLMs
Kyle O’Brien, ERA Fellow 2025
-

Mapping IAEA Verification Tools to International AI Governance: A Mechanism-by-Mechanism Analysis
Christina Krawec, ERA Fellow 2025
-

Towards Reliable Evaluation of Behavior Steering Interventions in LLMs
Itamar Pres, ERA Fellow 2024
-

The Case for Model Access Governance
Edward Kembery, ERA Fellow 2024
-

AI Safety Frameworks Should Include Procedure for Model Access Decisions
Edward Kembery & Tom Reed, ERA Fellows 2024
-

Verification methods for international AI agreements
Tom Reed & Jack William Miller
ERA Fellows 2024
-

Towards Safe Multilingual Frontier AI
Arturs Kanepajs & Vladimir Ivanov
ERA Fellows 2024 -

What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
Nathalie Maria Kirch & Severin Field, ERA Fellows 2024
-

Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment
Allison Huang, ERA Fellow 2024
-

Towards a UN Role in Governing Foundation Artificial Intelligence Models
Claire Dennis, ERA Fellow 2023
-

Welfare Diplomacy: Benchmarking Language Model Cooperation
Gabriel Mukobi, ERA Fellow 2023
