RSS

Evan R. Murphy

Karma: 1,195

Send me feedback anonymously

I’m doing research and other work focused on AI safety/​security, governance and risk reduction. Currently my top projects are (last updated Feb 26, 2025):

General areas of interest for me are AI safety strategy, comparative AI alignment research, prioritizing technical alignment work, analyzing the published alignment plans of major AI labs, interpretability, deconfusion research and other AI safety-related topics.

Research that I’ve authored or co-authored:

Before getting into AI safety, I was a software engineer for 11 years at Google and various startups. You can find details about my previous work on my LinkedIn.

While I’m not always great at responding, I’m happy to connect with other researchers or people interested in AI alignment and effective altruism. Feel free to send me a private message!

AI Risk: Can We Thread the Nee­dle? [Recorded Talk from EA Sum­mit Van­cou­ver ’25]

Evan R. Murphy2 Oct 2025 19:08 UTC
6 points
0 comments2 min readLW link

[Question] Does the Univer­sal Geom­e­try of Embed­dings pa­per have big im­pli­ca­tions for in­ter­pretabil­ity?

Evan R. Murphy26 May 2025 18:20 UTC
43 points
6 comments1 min readLW link

Evan R. Mur­phy’s Shortform

Evan R. Murphy28 Feb 2025 0:56 UTC
6 points
6 comments1 min readLW link

Steven Pinker on ChatGPT and AGI (Feb 2023)

Evan R. Murphy5 Mar 2023 21:34 UTC
11 points
8 comments1 min readLW link
(news.harvard.edu)

Steer­ing Be­havi­our: Test­ing for (Non-)My­opia in Lan­guage Models

5 Dec 2022 20:28 UTC
40 points
19 comments10 min readLW link

Paper: Large Lan­guage Models Can Self-im­prove [Linkpost]

Evan R. Murphy2 Oct 2022 1:29 UTC
53 points
15 comments1 min readLW link
(openreview.net)

Google AI in­te­grates PaLM with robotics: SayCan up­date [Linkpost]

Evan R. Murphy24 Aug 2022 20:54 UTC
25 points
0 comments1 min readLW link
(sites.research.google)

Sur­prised by ELK re­port’s coun­terex­am­ple to De­bate, IDA

Evan R. Murphy4 Aug 2022 2:12 UTC
22 points
0 comments5 min readLW link

New US Se­nate Bill on X-Risk Miti­ga­tion [Linkpost]

Evan R. Murphy4 Jul 2022 1:25 UTC
35 points
12 comments1 min readLW link
(www.hsgac.senate.gov)

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. Murphy12 May 2022 20:01 UTC
58 points
0 comments59 min readLW link

In­tro­duc­tion to the se­quence: In­ter­pretabil­ity Re­search for the Most Im­por­tant Century

Evan R. Murphy12 May 2022 19:59 UTC
16 points
0 comments8 min readLW link

[Question] What is a train­ing “step” vs. “epi­sode” in ma­chine learn­ing?

Evan R. Murphy28 Apr 2022 21:53 UTC
10 points
4 comments1 min readLW link

Ac­tion: Help ex­pand fund­ing for AI Safety by co­or­di­nat­ing on NSF response

Evan R. Murphy19 Jan 2022 22:47 UTC
23 points
8 comments3 min readLW link

Promis­ing posts on AF that have fallen through the cracks

Evan R. Murphy4 Jan 2022 15:39 UTC
34 points
6 comments2 min readLW link