RSS

Joe Carlsmith

Karma: 5,350

Senior advisor at Open Philanthropy. Doctorate in philosophy from the University of Oxford. Opinions my own.

Con­trol­ling the op­tions AIs can pursue

Joe Carlsmith29 Sep 2025 17:23 UTC
15 points
0 comments35 min readLW link

Video and tran­script of talk on giv­ing AIs safe motivations

Joe Carlsmith22 Sep 2025 16:43 UTC
12 points
0 comments50 min readLW link

Giv­ing AIs safe motivations

Joe Carlsmith18 Aug 2025 18:00 UTC
36 points
3 comments51 min readLW link

Video and tran­script of talk on “Can good­ness com­pete?”

Joe Carlsmith17 Jul 2025 17:54 UTC
98 points
19 comments34 min readLW link
(joecarlsmith.substack.com)

Video and tran­script of talk on AI welfare

Joe Carlsmith22 May 2025 16:15 UTC
24 points
1 comment28 min readLW link
(joecarlsmith.substack.com)

The stakes of AI moral status

Joe Carlsmith21 May 2025 18:20 UTC
79 points
64 comments14 min readLW link
(joecarlsmith.substack.com)

Video and tran­script of talk on au­tomat­ing al­ign­ment research

Joe Carlsmith30 Apr 2025 17:43 UTC
27 points
0 comments24 min readLW link
(joecarlsmith.com)

Can we safely au­to­mate al­ign­ment re­search?

Joe Carlsmith30 Apr 2025 17:37 UTC
47 points
29 comments48 min readLW link
(joecarlsmith.com)

AI for AI safety

Joe Carlsmith14 Mar 2025 15:00 UTC
78 points
13 comments17 min readLW link
(joecarlsmith.substack.com)

Paths and waysta­tions in AI safety

Joe Carlsmith11 Mar 2025 18:52 UTC
42 points
1 comment11 min readLW link
(joecarlsmith.substack.com)

When should we worry about AI power-seek­ing?

Joe Carlsmith19 Feb 2025 19:44 UTC
22 points
0 comments18 min readLW link
(joecarlsmith.substack.com)

What is it to solve the al­ign­ment prob­lem?

Joe Carlsmith13 Feb 2025 18:42 UTC
31 points
6 comments19 min readLW link
(joecarlsmith.substack.com)

How do we solve the al­ign­ment prob­lem?

Joe Carlsmith13 Feb 2025 18:27 UTC
63 points
9 comments7 min readLW link
(joecarlsmith.substack.com)

Fake think­ing and real thinking

Joe Carlsmith28 Jan 2025 20:05 UTC
112 points
17 comments38 min readLW link

Takes on “Align­ment Fak­ing in Large Lan­guage Models”

Joe Carlsmith18 Dec 2024 18:22 UTC
105 points
7 comments62 min readLW link

In­cen­tive de­sign and ca­pa­bil­ity elicitation

Joe Carlsmith12 Nov 2024 20:56 UTC
31 points
0 comments12 min readLW link

Op­tion control

Joe Carlsmith4 Nov 2024 17:54 UTC
28 points
0 comments54 min readLW link

Mo­ti­va­tion control

Joe Carlsmith30 Oct 2024 17:15 UTC
45 points
8 comments52 min readLW link

How might we solve the al­ign­ment prob­lem? (Part 1: In­tro, sum­mary, on­tol­ogy)

Joe Carlsmith28 Oct 2024 21:57 UTC
54 points
5 comments32 min readLW link

Video and tran­script of pre­sen­ta­tion on Oth­er­ness and con­trol in the age of AGI

Joe Carlsmith8 Oct 2024 22:30 UTC
35 points
1 comment27 min readLW link