Our Fellows
Technical AI Safety
-
Karim Abdel Sadek
Mitigating Goal Misgeneralization via Regret-based Auto-curricula generation
Karim is currently pursuing an MSc in AI at the University of Amsterdam and interning at KASL, focusing on reinforcement learning (RL), unsupervised environment design (UED), and AI safety. His prior work includes research in theoretical computer science, particularly in algorithms with predictions.
-
Einar Urdshals
Singular Learning Theory analysis of algorithmic transformers
Einar recently completed his PhD in theoretical physics at Chalmers Technical University. His AI safety research has focused on interpretability and agent foundations, stemming from his work at AI Safety Camp.
-
Itamar Pres
Distribution Mixing for Multi-Behavior Activation Steering in LLMs
Itamar is an undergraduate studying Mathematics and Computer Science at the University of Michigan, conducting research with the LIT group. His prior work includes leveraging mechanistic interpretability to analyze alignment algorithms, such as DPO, and he is currently an intern at the Krueger AI Safety Lab.
-
Vladimir Ivanov
Studying scheming tendencies in LLMs who are given hints that they are being evaluated
Vladimir is a master's student at ENS Paris, researching scheming tendencies in large language models (LLMs). His prior experience includes working with the SatsifIA team on non-optimizing reinforcement learning algorithms.
-
Michal Bravansky
Decomposing Interpretable Human Preferences and Values from RLHF Preference Data
Michal is studying Computer Science at University College London, with research interests in AI and human behavior analysis. He co-runs Verifee, a non-profit monitoring disinformation in Eastern Europe, which has secured over $0.6M in funding.
-
Jack Miller
Understanding the mechanisms for ethical decision-making in LLMs
Jack is pursuing a BSc in Mathematics and Computer Science at the Australian National University (ANU). His research spans climate ML, quantum chemistry, LLM generation, and the science of deep learning, with a focus on ethical decision-making mechanisms in LLMs.
-
Kaivu Hariharan
Semantic contamination: did you train on a cheatsheet?
Kaivu has completed his B.S. in Mathematics and Computer Science at MIT and will soon join the MEng program. His previous research includes work on adversarial examples, mechanistic interpretability, and deep learning science, alongside his role as strategy director at MAIA.
-
Joschka Braun
Limitations of contrastive activation steering in LLMs
Joschka is pursuing an MSc in Machine Learning at the University of Tübingen. He is currently a research intern at Krueger AI Safety Lab, focusing on representation engineering in LLMs. Previously, Joschka pursued research on controlled text generation at the Health NLP Lab Tübingen.
Technical Governance
-
Arturs Kanepajs
Toward Linguistically Inclusive AI Safety: Gaps and Policies
Arturs holds an MSc from the Stockholm School of Economics and a CPGS from the University of Cambridge. With over a decade of experience in finance, he began his involvement in AI governance in mid-2023, conducting independent research and participating in public discussions.
-
Severin Field
AI Safety Perceptions Among Experts
Severin holds a Bachelor’s degree in Physics and has experience as an AI Frameworks Engineer at Intel and an ML Intern at Lawrence Livermore National Laboratory. His most recent work centers on deceptive alignment and interpretability.
-
Rosco Hunter
Monitoring Human Dependence on AI Through ‘Reliance Drills’
Rosco is a PhD candidate at the University of Warwick, researching human dependence on AI in collaboration with Samsung. He is transitioning into AI governance research, focusing on systemic AI risks, during his ERA fellowship.
-
Tom Reed
Enhancing LLM Faithfulness Through Code Execution
Tom is a researcher in technical governance, with academic backgrounds in Psychology from UCL and History from Cambridge. His current work aims to enhance LLM faithfulness through code execution.
-
Nathalie Maria Kirch
Do Different LLM Attack Methods Exploit Similar Mechanisms?
Nathalie is an MSc student in Artificial Intelligence at Utrecht University, and she will soon begin a PhD at King’s College London. With a background in cognitive psychology and philosophy, her previous research at the Institute for Artificial Intelligence in Vienna focused on machine ethics and LLM benchmarking in medical contexts.
-
Misha Gerovitch
Analysis of State-Proof Security Measures in AI Data Centers
Misha is pursuing a B.S. in Computer Science Engineering at MIT and is set to join the MEng program. His research includes mechanistic interpretability automation and LLM-on-LLM deception, and he co-leads MIT AI Alignment (MAIA) and their AI policy programs.
-
Carson Ezell
Risk Assessment for Agentic AI Deployments
Carson is an undergraduate studying Philosophy at Harvard College, where he also serves as the Policy Research Lead for the AI Safety Student Team (AISST). His research focuses on transparency, AI auditing, and institutional design for AI regulation.
-
Pauline Kuss
Governing AI Agents – An Affordance Perspective
Pauline is a third-year PhD student at Freie Universität Berlin, specializing in agentic AI and sociotechnical approaches to AI governance.
-
Allison Huang
How well can LLMs defend against persuasive text in decision-making scenarios?
Allison is an undergraduate at the University of Southern California, where she is pursuing an integrated degree in Computer Science, Design, and Business. Her research explores how LLMs defend against persuasive text in decision-making contexts.
-
Isaac Robinson
The Rationality of AI Agents
Isaac is a PhD student at Oxford University, studying computer science with a focus on algorithmic game theory and AI fairness. His current research examines the rationality of AI agents and governance issues in advanced models.
-
Tina Wünn
Governance of AI-bio tools in the Global South
Tina holds a BSc in Biology and an MSc in Medical Informatics. She has contributed to biosecurity policy research and is now focused on AI-biosecurity governance, particularly in the Global South.
AI Governance
-
Alejandro Ortega
Evaluating GPU Self-Destruct Mechanisms
Alejandro is about to begin a 6-month placement at Apollo, following freelance AI governance research projects on nuclear power regulation and voluntary safety frameworks. He holds an MSci in Physics and Philosophy from Bristol and previously led EA Oxford.
-
Madeline Proctor
Mapping Political Considerations for a Strict Liability Regime with Expanded Punitive Damages for Advanced AI in the United States
Madeline is studying Social Studies at Harvard, specializing in AI and the Law. She is writing her thesis on the digital trade goods within international law and has experience assessing LLM-powered legal technologies and U.S. constitutional law.
-
Heramb Podar
How to Harness a Llama: A Guide for Hosting Platforms to Secure the Open-Source Ecosystem
Heramb is a final-year student at IIT Roorkee with previous experience at the Center for AI and Digital Policy and the Millennium Project. His research targets hosting platforms to prevent the proliferation of open-source AI models.
-
James Lester
Assessing Risk Compensation Effects in the Deployment of Risky Technology - Evidence and Implications for AI Strategy and Policy
James recently graduated from the University of Cambridge in Economics, having helped with outreach and community building for AI Governance and Effective Altruism. His undergraduate thesis modelled how expert advice on existential risk policy might break down under imperfect information and heterogeneous priors over risk levels.
-
Edward Kembery
Towards Responsible Model Access Governance
Edward holds an MPhil in AI and Ethics from the Center for the Future of Intelligence at Cambridge. He has contributed to AI advisory groups, including CDX for the Japanese government, and helped establish the Cambridge AI Safety Hub’s policy wing.
-
Jai Patel
Developing an Agentic AI Crisis Response Framework (UK)
Jai completed an MPhil in Ethics of AI, Data, and Algorithms at Cambridge and has experience in AI policy, including responsible scaling research. He currently works with the UKAISI Safeguards Team.
-
Damin Curtis
Coordination avenues between frontier AI developers and US defense/intelligence community
Damin holds a master’s degree in International Affairs, specializing in technology policy. His research covers AI regulation, advanced chip access, and U.S. security posture in East Asia.
-
Duncan McClements
Optimal punitive damages under mandatory insurance
Duncan is an economics student at the University of Cambridge and a Research Associate at the Adam Smith Institute. He is focusing on optimal punitive damages in AI regulation.
-
Ben Chancey
Towards a Standard of Strict Liability for Harms Caused by Sufficiently Advanced AI Systems
Ben is studying Philosophy and Computer Science at McGill University, with a focus on AI governance and policy. He is pursuing a career in AI governance, with research interests in liability standards for advanced AI systems.