I’m an AI alignment researcher currently focused on myopia and language models. I’m also interested in interpretability and other AI safety-related topics. My research is independent and currently supported by a grant from the Future Fund regranting program*.
Research that I’ve authored or co-authored:
Steering Behaviour: Testing for (Non-)Myopia in Language Models
Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios
(Scroll down to read other posts and comments I’ve written)
Other recent work:
On the interim Board of Directors for AI Governance & Safety Canada
Invited/participated in the first CLTC UC Berkeley Virtual Workshop on “Risk Management-Standards Profile for Increasingly Multi-Purpose or General-Purpose AI” (Jan 2023)
Run a regular coworking meetup in Vancouver, BC for people interested in AI safety and effective altruism
Facilitator for the AI Safety Fellowship (2022) at Columbia University Effective Altruism
Gave a talk on myopia and deceptive alignment at an AI safety event hosted by University of Victoria (Jan 29, 2023)
Reviewed early pre-published drafts of work by other researchers:
Circumventing interpretability: How to defeat mind-readers by Lee Sharkey
Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks by Tony Barrett, Dan Hendryks, Jessica Newman and Brandie Nonnecke
AI Safety Seems Hard to Measure by Holden Karnofsky
Racing through a minefield: the AI deployment problem by Holden Karnofsky
Alignment with argument-networks and assessment-predictions by Tor Økland Barstad
DeepMind’s generalist AI, Gato: A non-technical explainer by Frances Lorenz, Nora Belrose and Jon Menaster
Potential Alignment mental tool: Keeping track of the types by Donald Hobson
Submissions for the AI Safety Arguments Competition
Ideal Governance by Holden Karnofsky
Before getting into AI alignment, I was a software engineer for 11 years at Google and various startups. You can find details about my previous work on my LinkedIn.
I’m always happy to connect with other researchers or people interested in AI alignment and effective altruism. Feel free to send me a private message!
--
*In light of the FTX crisis, I’ve set aside the grant funds I received from Future Fund and am evaluating whether/how this money can be returned to customers of FTX who lost their savings in the debacle. In the meantime, I continue to work on AI alignment research using my personal savings. If you’re interested in funding my research or hiring me for related work, please reach out.
Instead of “accident”, we could say “gross negligence” or “recklessness” for catastrophic risk from AI misalignment.