I’m doing research and other work focused on AI safety/security, governance and risk reduction. Currently my top projects are (last updated Feb 26, 2025):
Technical researcher for UC Berkeley at the AI Security Initiative, part of the Center for Long-Term Cybersecurity (CLTC)
Serving on the board of directors for AI Governance & Safety Canada
General areas of interest for me are AI safety strategy, comparative AI alignment research, prioritizing technical alignment work, analyzing the published alignment plans of major AI labs, interpretability, deconfusion research and other AI safety-related topics.
Research that I’ve authored or co-authored:
Steering Behaviour: Testing for (Non-)Myopia in Language Models
Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios
(Scroll down to read other posts and comments I’ve written)
Before getting into AI safety, I was a software engineer for 11 years at Google and various startups. You can find details about my previous work on my LinkedIn.
While I’m not always great at responding, I’m happy to connect with other researchers or people interested in AI alignment and effective altruism. Feel free to send me a private message!
Actually the OGI-1 model (and to a lesser extent, the OGI-N model) does do something important to address loss of control risks from AGI (or ASI): it reduces competitive race dynamics.
There are plausible scenarios where it is technically possible for a lab to safely develop AGI, but where doing so would require them to slow down development. When they are competitively racing against other AGI projects, the incentives are (potentially much) stronger to proceed with risky development. But when a lab doesn’t have to worry about competitors, then they at least have an opportunity to pursue costly safety measures without sacrificing their lead.