Our Fellows

Technical AI Safety

Karim Abdel Sadek

Mitigating Goal Misgeneralization via Regret-based Auto-curricula generation
Karim is currently pursuing an MSc in AI at the University of Amsterdam and interning at KASL, focusing on reinforcement learning (RL), unsupervised environment design (UED), and AI safety. His prior work includes research in theoretical computer science, particularly in algorithms with predictions.
Einar Urdshals

Singular Learning Theory analysis of algorithmic transformers
Einar recently completed his PhD in theoretical physics at Chalmers Technical University. His AI safety research has focused on interpretability and agent foundations, stemming from his work at AI Safety Camp.
Itamar Pres

Distribution Mixing for Multi-Behavior Activation Steering in LLMs
Itamar is an undergraduate studying Mathematics and Computer Science at the University of Michigan, conducting research with the LIT group. His prior work includes leveraging mechanistic interpretability to analyze alignment algorithms, such as DPO, and he is currently an intern at the Krueger AI Safety Lab.
Vladimir Ivanov

Studying scheming tendencies in LLMs who are given hints that they are being evaluated
Vladimir is a master's student at ENS Paris, researching scheming tendencies in large language models (LLMs). His prior experience includes working with the SatsifIA team on non-optimizing reinforcement learning algorithms.
Michal Bravansky

Decomposing Interpretable Human Preferences and Values from RLHF Preference Data
Michal is studying Computer Science at University College London, with research interests in AI and human behavior analysis. He co-runs Verifee, a non-profit monitoring disinformation in Eastern Europe, which has secured over $0.6M in funding.
Jack Miller

Understanding the mechanisms for ethical decision-making in LLMs
Jack is pursuing a BSc in Mathematics and Computer Science at the Australian National University (ANU). His research spans climate ML, quantum chemistry, LLM generation, and the science of deep learning, with a focus on ethical decision-making mechanisms in LLMs.
Kaivu Hariharan

Semantic contamination: did you train on a cheatsheet?
Kaivu has completed his B.S. in Mathematics and Computer Science at MIT and will soon join the MEng program. His previous research includes work on adversarial examples, mechanistic interpretability, and deep learning science, alongside his role as strategy director at MAIA.
Joschka Braun

Limitations of contrastive activation steering in LLMs

Joschka is pursuing an MSc in Machine Learning at the University of Tübingen. He is currently a research intern at Krueger AI Safety Lab, focusing on representation engineering in LLMs. Previously, Joschka pursued research on controlled text generation at the Health NLP Lab Tübingen.

Technical Governance

Arturs Kanepajs

Toward Linguistically Inclusive AI Safety: Gaps and Policies
Arturs holds an MSc from the Stockholm School of Economics and a CPGS from the University of Cambridge. With over a decade of experience in finance, he began his involvement in AI governance in mid-2023, conducting independent research and participating in public discussions.
Severin Field

AI Safety Perceptions Among Experts
Severin holds a Bachelor’s degree in Physics and has experience as an AI Frameworks Engineer at Intel and an ML Intern at Lawrence Livermore National Laboratory. His most recent work centers on deceptive alignment and interpretability.
Rosco Hunter

Monitoring Human Dependence on AI Through ‘Reliance Drills’
Rosco is a PhD candidate at the University of Warwick, researching human dependence on AI in collaboration with Samsung. He is transitioning into AI governance research, focusing on systemic AI risks, during his ERA fellowship.
Tom Reed

Enhancing LLM Faithfulness Through Code Execution
Tom is a researcher in technical governance, with academic backgrounds in Psychology from UCL and History from Cambridge. His current work aims to enhance LLM faithfulness through code execution.
Nathalie Maria Kirch

Do Different LLM Attack Methods Exploit Similar Mechanisms?
Nathalie is an MSc student in Artificial Intelligence at Utrecht University, and she will soon begin a PhD at King’s College London. With a background in cognitive psychology and philosophy, her previous research at the Institute for Artificial Intelligence in Vienna focused on machine ethics and LLM benchmarking in medical contexts.
Misha Gerovitch

Analysis of State-Proof Security Measures in AI Data Centers
Misha is pursuing a B.S. in Computer Science Engineering at MIT and is set to join the MEng program. His research includes mechanistic interpretability automation and LLM-on-LLM deception, and he co-leads MIT AI Alignment (MAIA) and their AI policy programs.
Carson Ezell

Risk Assessment for Agentic AI Deployments
Carson is an undergraduate studying Philosophy at Harvard College, where he also serves as the Policy Research Lead for the AI Safety Student Team (AISST). His research focuses on transparency, AI auditing, and institutional design for AI regulation.
Pauline Kuss

Governing AI Agents – An Affordance Perspective
Pauline is a third-year PhD student at Freie Universität Berlin, specializing in agentic AI and sociotechnical approaches to AI governance.
Allison Huang

How well can LLMs defend against persuasive text in decision-making scenarios?
Allison is an undergraduate at the University of Southern California, where she is pursuing an integrated degree in Computer Science, Design, and Business. Her research explores how LLMs defend against persuasive text in decision-making contexts.
Isaac Robinson

The Rationality of AI Agents
Isaac is a PhD student at Oxford University, studying computer science with a focus on algorithmic game theory and AI fairness. His current research examines the rationality of AI agents and governance issues in advanced models.
Tina Wünn

Governance of AI-bio tools in the Global South
Tina holds a BSc in Biology and an MSc in Medical Informatics. She has contributed to biosecurity policy research and is now focused on AI-biosecurity governance, particularly in the Global South.

AI Governance

Alejandro Ortega

Evaluating GPU Self-Destruct Mechanisms
Alejandro is about to begin a 6-month placement at Apollo, following freelance AI governance research projects on nuclear power regulation and voluntary safety frameworks. He holds an MSci in Physics and Philosophy from Bristol and previously led EA Oxford.
Madeline Proctor

Mapping Political Considerations for a Strict Liability Regime with Expanded Punitive Damages for Advanced AI in the United States
Madeline is studying Social Studies at Harvard, specializing in AI and the Law. She is writing her thesis on the digital trade goods within international law and has experience assessing LLM-powered legal technologies and U.S. constitutional law.
Heramb Podar

How to Harness a Llama: A Guide for Hosting Platforms to Secure the Open-Source Ecosystem
Heramb is a final-year student at IIT Roorkee with previous experience at the Center for AI and Digital Policy and the Millennium Project. His research targets hosting platforms to prevent the proliferation of open-source AI models.
James Lester

Assessing Risk Compensation Effects in the Deployment of Risky Technology - Evidence and Implications for AI Strategy and Policy
James recently graduated from the University of Cambridge in Economics, having helped with outreach and community building for AI Governance and Effective Altruism. His undergraduate thesis modelled how expert advice on existential risk policy might break down under imperfect information and heterogeneous priors over risk levels.
Edward Kembery

Towards Responsible Model Access Governance
Edward holds an MPhil in AI and Ethics from the Center for the Future of Intelligence at Cambridge. He has contributed to AI advisory groups, including CDX for the Japanese government, and helped establish the Cambridge AI Safety Hub’s policy wing.
Jai Patel

Developing an Agentic AI Crisis Response Framework (UK)
Jai completed an MPhil in Ethics of AI, Data, and Algorithms at Cambridge and has experience in AI policy, including responsible scaling research. He currently works with the UKAISI Safeguards Team.
Damin Curtis

Coordination avenues between frontier AI developers and US defense/intelligence community
Damin holds a master’s degree in International Affairs, specializing in technology policy. His research covers AI regulation, advanced chip access, and U.S. security posture in East Asia.
Duncan McClements

Optimal punitive damages under mandatory insurance
Duncan is an economics student at the University of Cambridge and a Research Associate at the Adam Smith Institute. He is focusing on optimal punitive damages in AI regulation.
Ben Chancey

Towards a Standard of Strict Liability for Harms Caused by Sufficiently Advanced AI Systems
Ben is studying Philosophy and Computer Science at McGill University, with a focus on AI governance and policy. He is pursuing a career in AI governance, with research interests in liability standards for advanced AI systems.

Our Fellows

Karim Abdel Sadek

Einar Urdshals

Itamar Pres

Vladimir Ivanov

Michal Bravansky

Jack Miller

Kaivu Hariharan

Joschka Braun

Arturs Kanepajs

Severin Field

Rosco Hunter

Tom Reed

Nathalie Maria Kirch

Misha Gerovitch

Carson Ezell

Pauline Kuss

Allison Huang

Isaac Robinson

Tina Wünn

Alejandro Ortega

Madeline Proctor

Heramb Podar

James Lester

Edward Kembery

Jai Patel

Damin Curtis

Duncan McClements

Ben Chancey