Evan R. Murphy

Karma: 1,195

Send me feedback anonymously

I’m doing research and other work focused on AI safety/security, governance and risk reduction. Currently my top projects are (last updated Feb 26, 2025):

Technical researcher for UC Berkeley at the AI Security Initiative, part of the Center for Long-Term Cybersecurity (CLTC)
Serving on the board of directors for AI Governance & Safety Canada

General areas of interest for me are AI safety strategy, comparative AI alignment research, prioritizing technical alignment work, analyzing the published alignment plans of major AI labs, interpretability, deconfusion research and other AI safety-related topics.

Research that I’ve authored or co-authored:

See publications on Google Scholar
Steering Behaviour: Testing for (Non-)Myopia in Language Models
Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios
(Scroll down to read other posts and comments I’ve written)

Before getting into AI safety, I was a software engineer for 11 years at Google and various startups. You can find details about my previous work on my LinkedIn.

While I’m not always great at responding, I’m happy to connect with other researchers or people interested in AI alignment and effective altruism. Feel free to send me a private message!

AI Risk: Can We Thread the Needle? [Recorded Talk from EA Summit Vancouver ’25]

Evan R. Murphy2 Oct 2025 19:08 UTC

6 points

0 comments2 min readLW link

[Question] Does the Universal Geometry of Embeddings paper have big implications for interpretability?

Evan R. Murphy26 May 2025 18:20 UTC

43 points

6 comments1 min readLW link

Evan R. Murphy’s Shortform

Evan R. Murphy28 Feb 2025 0:56 UTC

6 points

6 comments1 min readLW link

Steven Pinker on ChatGPT and AGI (Feb 2023)

Evan R. Murphy5 Mar 2023 21:34 UTC

11 points

8 comments1 min readLW link

(news.harvard.edu)

Steering Behaviour: Testing for (Non-)Myopia in Language Models

Evan R. Murphy and Megan Kinniment

5 Dec 2022 20:28 UTC

40 points

19 comments10 min readLW link

Paper: Large Language Models Can Self-improve [Linkpost]

Evan R. Murphy2 Oct 2022 1:29 UTC

53 points

15 comments1 min readLW link

(openreview.net)

Google AI integrates PaLM with robotics: SayCan update [Linkpost]

Evan R. Murphy24 Aug 2022 20:54 UTC

25 points

0 comments1 min readLW link

(sites.research.google)

Surprised by ELK report’s counterexample to Debate, IDA

Evan R. Murphy4 Aug 2022 2:12 UTC

22 points

0 comments5 min readLW link

New US Senate Bill on X-Risk Mitigation [Linkpost]

Evan R. Murphy4 Jul 2022 1:25 UTC

35 points

12 comments1 min readLW link

(www.hsgac.senate.gov)

Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios

Evan R. Murphy12 May 2022 20:01 UTC

58 points

0 comments59 min readLW link

Introduction to the sequence: Interpretability Research for the Most Important Century

Evan R. Murphy12 May 2022 19:59 UTC

16 points

0 comments8 min readLW link

[Question] What is a training “step” vs. “episode” in machine learning?

Evan R. Murphy28 Apr 2022 21:53 UTC

10 points

4 comments1 min readLW link

Action: Help expand funding for AI Safety by coordinating on NSF response

Evan R. Murphy19 Jan 2022 22:47 UTC

23 points

8 comments3 min readLW link

Promising posts on AF that have fallen through the cracks

Evan R. Murphy4 Jan 2022 15:39 UTC

34 points

6 comments2 min readLW link