Our Fellows

Technical AI Safety

  • Karim Abdel Sadek

    Mitigating Goal Misgeneralization via Regret-based Auto-curricula generation

    Karim is currently pursuing an MSc in AI at the University of Amsterdam and interning at KASL, focusing on reinforcement learning (RL), unsupervised environment design (UED), and AI safety. His prior work includes research in theoretical computer science, particularly in algorithms with predictions.

  • Einar Urdshals

    Singular Learning Theory analysis of algorithmic transformers

    Einar recently completed his PhD in theoretical physics at Chalmers Technical University. His AI safety research has focused on interpretability and agent foundations, stemming from his work at AI Safety Camp.

  • Itamar Pres

    Distribution Mixing for Multi-Behavior Activation Steering in LLMs

    Itamar is an undergraduate studying Mathematics and Computer Science at the University of Michigan, conducting research with the LIT group. His prior work includes leveraging mechanistic interpretability to analyze alignment algorithms, such as DPO, and he is currently an intern at the Krueger AI Safety Lab.

  • Vladimir Ivanov

    Studying scheming tendencies in LLMs who are given hints that they are being evaluated

    Vladimir is a master's student at ENS Paris, researching scheming tendencies in large language models (LLMs). His prior experience includes working with the SatsifIA team on non-optimizing reinforcement learning algorithms.

  • Michal Bravansky

    Decomposing Interpretable Human Preferences and Values from RLHF Preference Data

    Michal is studying Computer Science at University College London, with research interests in AI and human behavior analysis. He co-runs Verifee, a non-profit monitoring disinformation in Eastern Europe, which has secured over $0.6M in funding.

  • Jack Miller

    Understanding the mechanisms for ethical decision-making in LLMs

    Jack is pursuing a BSc in Mathematics and Computer Science at the Australian National University (ANU). His research spans climate ML, quantum chemistry, LLM generation, and the science of deep learning, with a focus on ethical decision-making mechanisms in LLMs.

  • Kaivu Hariharan

    Semantic contamination: did you train on a cheatsheet?

    Kaivu has completed his B.S. in Mathematics and Computer Science at MIT and will soon join the MEng program. His previous research includes work on adversarial examples, mechanistic interpretability, and deep learning science, alongside his role as strategy director at MAIA.

  • Joschka Braun

    Limitations of contrastive activation steering in LLMs

    Joschka is pursuing an MSc in Machine Learning at the University of Tübingen. He is currently a research intern at Krueger AI Safety Lab, focusing on representation engineering in LLMs. Previously, Joschka pursued research on controlled text generation at the Health NLP Lab Tübingen.

Technical Governance

  • Arturs Kanepajs

    Toward Linguistically Inclusive AI Safety: Gaps and Policies

    Arturs holds an MSc from the Stockholm School of Economics and a CPGS from the University of Cambridge. With over a decade of experience in finance, he began his involvement in AI governance in mid-2023, conducting independent research and participating in public discussions.

  • Severin Field

    AI Safety Perceptions Among Experts

    Severin holds a Bachelor’s degree in Physics and has experience as an AI Frameworks Engineer at Intel and an ML Intern at Lawrence Livermore National Laboratory. His most recent work centers on deceptive alignment and interpretability.

  • Rosco Hunter

    Monitoring Human Dependence on AI Through ‘Reliance Drills’

    Rosco is a PhD candidate at the University of Warwick, researching human dependence on AI in collaboration with Samsung. He is transitioning into AI governance research, focusing on systemic AI risks, during his ERA fellowship.

  • Tom Reed

    Enhancing LLM Faithfulness Through Code Execution

    Tom is a researcher in technical governance, with academic backgrounds in Psychology from UCL and History from Cambridge. His current work aims to enhance LLM faithfulness through code execution.

  • Nathalie Maria Kirch

    Do Different LLM Attack Methods Exploit Similar Mechanisms?

    Nathalie is an MSc student in Artificial Intelligence at Utrecht University, and she will soon begin a PhD at King’s College London. With a background in cognitive psychology and philosophy, her previous research at the Institute for Artificial Intelligence in Vienna focused on machine ethics and LLM benchmarking in medical contexts.

  • Misha Gerovitch

    Analysis of State-Proof Security Measures in AI Data Centers

    Misha is pursuing a B.S. in Computer Science Engineering at MIT and is set to join the MEng program. His research includes mechanistic interpretability automation and LLM-on-LLM deception, and he co-leads MIT AI Alignment (MAIA) and their AI policy programs.

  • Carson Ezell

    Risk Assessment for Agentic AI Deployments

    Carson is an undergraduate studying Philosophy at Harvard College, where he also serves as the Policy Research Lead for the AI Safety Student Team (AISST). His research focuses on transparency, AI auditing, and institutional design for AI regulation.

  • Pauline Kuss

    Governing AI Agents – An Affordance Perspective

    Pauline is a third-year PhD student at Freie Universität Berlin, specializing in agentic AI and sociotechnical approaches to AI governance.

  • Allison Huang

    How well can LLMs defend against persuasive text in decision-making scenarios?

    Allison is an undergraduate at the University of Southern California, where she is pursuing an integrated degree in Computer Science, Design, and Business. Her research explores how LLMs defend against persuasive text in decision-making contexts.

  • Isaac Robinson

    The Rationality of AI Agents

    Isaac is a PhD student at Oxford University, studying computer science with a focus on algorithmic game theory and AI fairness. His current research examines the rationality of AI agents and governance issues in advanced models.

  • Tina Wünn

    Governance of AI-bio tools in the Global South

    Tina holds a BSc in Biology and an MSc in Medical Informatics. She has contributed to biosecurity policy research and is now focused on AI-biosecurity governance, particularly in the Global South.

AI Governance

  • Alejandro Ortega

    Evaluating GPU Self-Destruct Mechanisms

    Alejandro is about to begin a 6-month placement at Apollo, following freelance AI governance research projects on nuclear power regulation and voluntary safety frameworks. He holds an MSci in Physics and Philosophy from Bristol and previously led EA Oxford.

  • Madeline Proctor

    Mapping Political Considerations for a Strict Liability Regime with Expanded Punitive Damages for Advanced AI in the United States

    Madeline is studying Social Studies at Harvard, specializing in AI and the Law. She is writing her thesis on the digital trade goods within international law and has experience assessing LLM-powered legal technologies and U.S. constitutional law.

  • Heramb Podar

    How to Harness a Llama: A Guide for Hosting Platforms to Secure the Open-Source Ecosystem

    Heramb is a final-year student at IIT Roorkee with previous experience at the Center for AI and Digital Policy and the Millennium Project. His research targets hosting platforms to prevent the proliferation of open-source AI models.

  • James Lester

    Assessing Risk Compensation Effects in the Deployment of Risky Technology - Evidence and Implications for AI Strategy and Policy

    James recently graduated from the University of Cambridge in Economics, having helped with outreach and community building for AI Governance and Effective Altruism. His undergraduate thesis modelled how expert advice on existential risk policy might break down under imperfect information and heterogeneous priors over risk levels.

  • Edward Kembery

    Towards Responsible Model Access Governance

    Edward holds an MPhil in AI and Ethics from the Center for the Future of Intelligence at Cambridge. He has contributed to AI advisory groups, including CDX for the Japanese government, and helped establish the Cambridge AI Safety Hub’s policy wing.

  • Jai Patel

    Developing an Agentic AI Crisis Response Framework (UK)

    Jai completed an MPhil in Ethics of AI, Data, and Algorithms at Cambridge and has experience in AI policy, including responsible scaling research. He currently works with the UKAISI Safeguards Team.

  • Damin Curtis

    Coordination avenues between frontier AI developers and US defense/intelligence community

    Damin holds a master’s degree in International Affairs, specializing in technology policy. His research covers AI regulation, advanced chip access, and U.S. security posture in East Asia.

  • Duncan McClements

    Optimal punitive damages under mandatory insurance

    Duncan is an economics student at the University of Cambridge and a Research Associate at the Adam Smith Institute. He is focusing on optimal punitive damages in AI regulation.

  • Ben Chancey

    Towards a Standard of Strict Liability for Harms Caused by Sufficiently Advanced AI Systems

    Ben is studying Philosophy and Computer Science at McGill University, with a focus on AI governance and policy. He is pursuing a career in AI governance, with research interests in liability standards for advanced AI systems.