Joe Carlsmith

Karma: 5,519

Former senior advisor at Open Philanthropy. Doctorate in philosophy from the University of Oxford. Opinions my own.

How human-like do safe AI motivations need to be?

Joe Carlsmith12 Nov 2025 5:32 UTC

27 points

9 comments52 min readLW link

Leaving Open Philanthropy, going to Anthropic

Joe Carlsmith3 Nov 2025 17:38 UTC

110 points

29 comments18 min readLW link

Controlling the options AIs can pursue

Joe Carlsmith29 Sep 2025 17:23 UTC

8 points

0 comments35 min readLW link

Video and transcript of talk on giving AIs safe motivations

Joe Carlsmith22 Sep 2025 16:43 UTC

14 points

2 comments50 min readLW link

Giving AIs safe motivations

Joe Carlsmith18 Aug 2025 18:00 UTC

36 points

5 comments51 min readLW link

Video and transcript of talk on “Can goodness compete?”

Joe Carlsmith17 Jul 2025 17:54 UTC

98 points

19 comments34 min readLW link

(joecarlsmith.substack.com)

Video and transcript of talk on AI welfare

Joe Carlsmith22 May 2025 16:15 UTC

24 points

1 comment28 min readLW link

(joecarlsmith.substack.com)

The stakes of AI moral status

Joe Carlsmith21 May 2025 18:20 UTC

79 points

64 comments14 min readLW link

(joecarlsmith.substack.com)

Video and transcript of talk on automating alignment research

Joe Carlsmith30 Apr 2025 17:43 UTC

27 points

0 comments24 min readLW link

(joecarlsmith.com)

Can we safely automate alignment research?

Joe Carlsmith30 Apr 2025 17:37 UTC

52 points

30 comments48 min readLW link

(joecarlsmith.com)

AI for AI safety

Joe Carlsmith14 Mar 2025 15:00 UTC

79 points

13 comments17 min readLW link

(joecarlsmith.substack.com)

Paths and waystations in AI safety

Joe Carlsmith11 Mar 2025 18:52 UTC

42 points

1 comment11 min readLW link

(joecarlsmith.substack.com)

When should we worry about AI power-seeking?

Joe Carlsmith19 Feb 2025 19:44 UTC

22 points

0 comments18 min readLW link

(joecarlsmith.substack.com)

What is it to solve the alignment problem?

Joe Carlsmith13 Feb 2025 18:42 UTC

31 points

6 comments19 min readLW link

(joecarlsmith.substack.com)

How do we solve the alignment problem?

Joe Carlsmith13 Feb 2025 18:27 UTC

69 points

9 comments7 min readLW link

(joecarlsmith.substack.com)

Fake thinking and real thinking

Joe Carlsmith28 Jan 2025 20:05 UTC

111 points

17 comments38 min readLW link

Takes on “Alignment Faking in Large Language Models”

Joe Carlsmith18 Dec 2024 18:22 UTC

105 points

7 comments62 min readLW link

Incentive design and capability elicitation

Joe Carlsmith12 Nov 2024 20:56 UTC

31 points

0 comments12 min readLW link

Option control

Joe Carlsmith4 Nov 2024 17:54 UTC

28 points

0 comments54 min readLW link

Motivation control

Joe Carlsmith30 Oct 2024 17:15 UTC

45 points

9 comments52 min readLW link