Joe Carlsmith

Karma: 5,158

Senior research analyst at Open Philanthropy. Doctorate in philosophy from the University of Oxford. Opinions my own.

The goal-guarding hypothesis (Section 2.3.1.1 of “Scheming AIs”)

Joe CarlsmithDec 2, 2023, 3:20 PM

8 points

1 comment15 min readLW link

How useful for alignment-relevant work are AIs with short-term goals? (Section 2.2.4.3 of “Scheming AIs”)

Joe CarlsmithDec 1, 2023, 2:51 PM

10 points

1 comment7 min readLW link

Is scheming more likely in models trained to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of “Scheming AIs”)

Joe CarlsmithNov 30, 2023, 4:43 PM

8 points

0 comments6 min readLW link

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)

Joe CarlsmithNov 29, 2023, 4:32 PM

29 points

1 comment11 min readLW link

Two sources of beyond-episode goals (Section 2.2.2 of “Scheming AIs”)

Joe CarlsmithNov 28, 2023, 1:49 PM

11 points

1 comment15 min readLW link

Two concepts of an “episode” (Section 2.2.1 of “Scheming AIs”)

Joe CarlsmithNov 27, 2023, 6:01 PM

19 points

1 comment13 min readLW link

Situational awareness (Section 2.1 of “Scheming AIs”)

Joe CarlsmithNov 26, 2023, 11:00 PM

10 points

5 comments8 min readLW link

On “slack” in training (Section 1.5 of “Scheming AIs”)

Joe CarlsmithNov 25, 2023, 5:51 PM

1 point

0 comments5 min readLW link

Why focus on schemers in particular (Sections 1.3 and 1.4 of “Scheming AIs”)

Joe CarlsmithNov 24, 2023, 7:18 PM

8 points

0 comments22 min readLW link

A taxonomy of non-schemer models (Section 1.2 of “Scheming AIs”)

Joe CarlsmithNov 22, 2023, 3:24 PM

13 points

0 comments13 min readLW link

Varieties of fake alignment (Section 1.1 of “Scheming AIs”)

Joe CarlsmithNov 21, 2023, 3:00 PM

15 points

0 comments12 min readLW link

New report: “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Joe CarlsmithNov 15, 2023, 5:16 PM

81 points

28 comments30 min readLW link 1 review

Superforecasting the premises in “Is power-seeking AI an existential risk?”

Joe CarlsmithOct 18, 2023, 8:23 PM

31 points

3 comments5 min readLW link

In memory of Louise Glück

Joe CarlsmithOct 15, 2023, 2:59 AM

41 points

1 comment8 min readLW link

The “no sandbagging on checkable tasks” hypothesis

Joe CarlsmithJul 31, 2023, 11:06 PM

61 points

14 comments9 min readLW link

Predictable updating about AI risk

Joe CarlsmithMay 8, 2023, 9:53 PM

294 points

25 comments36 min readLW link 1 review

[Linkpost] Shorter version of report on existential risk from power-seeking AI

Joe CarlsmithMar 22, 2023, 6:09 PM

7 points

0 comments1 min readLW link

A Stranger Priority? Topics at the Outer Reaches of Effective Altruism (my dissertation)

Joe CarlsmithFeb 21, 2023, 5:26 PM

38 points

16 comments1 min readLW link

Seeing more whole

Joe CarlsmithFeb 17, 2023, 5:12 AM

31 points

1 comment26 min readLW link

Why should ethical anti-realists do ethics?

Joe CarlsmithFeb 16, 2023, 4:27 PM

38 points

7 comments27 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer