Shal­low re­view of live agen­das in al­ign­ment & safety

27 Nov 2023 11:10 UTC
307 points
69 comments29 min readLW link

So­cial Dark Matter

[DEACTIVATED] Duncan Sabien16 Nov 2023 20:00 UTC
282 points
112 comments34 min readLW link

OpenAI: The Bat­tle of the Board

Zvi22 Nov 2023 17:30 UTC
277 points
82 comments11 min readLW link
(thezvi.wordpress.com)

OpenAI: Facts from a Weekend

Zvi20 Nov 2023 15:30 UTC
264 points
158 comments9 min readLW link
(thezvi.wordpress.com)

The 6D effect: When com­pa­nies take risks, one email can be very pow­er­ful.

scasper4 Nov 2023 20:08 UTC
261 points
40 comments3 min readLW link

AI Timelines

10 Nov 2023 5:28 UTC
255 points
74 comments51 min readLW link

The 101 Space You Will Always Have With You

Screwtape29 Nov 2023 4:56 UTC
245 points
20 comments6 min readLW link

What are the re­sults of more parental su­per­vi­sion and less out­door play?

juliawise25 Nov 2023 12:52 UTC
215 points
30 comments5 min readLW link

Abil­ity to solve long-hori­zon tasks cor­re­lates with want­ing things in the be­hav­iorist sense

So8res24 Nov 2023 17:37 UTC
202 points
82 comments5 min readLW link

Sam Alt­man fired from OpenAI

LawrenceC17 Nov 2023 20:42 UTC
192 points
75 comments1 min readLW link
(openai.com)

Pro­pa­ganda or Science: A Look at Open Source AI and Bioter­ror­ism Risk

1a3orn2 Nov 2023 18:20 UTC
191 points
79 comments23 min readLW link

The other side of the tidal wave

KatjaGrace3 Nov 2023 5:40 UTC
185 points
79 comments1 min readLW link
(worldspiritsockpuppet.com)

Think­ing By The Clock

Screwtape8 Nov 2023 7:40 UTC
185 points
27 comments8 min readLW link

Vote on In­ter­est­ing Disagreements

Ben Pace7 Nov 2023 21:35 UTC
159 points
129 comments1 min readLW link

My thoughts on the so­cial re­sponse to AI risk

Matthew Barnett1 Nov 2023 21:17 UTC
157 points
36 comments10 min readLW link

You can just spon­ta­neously call peo­ple you haven’t met in years

lc13 Nov 2023 5:21 UTC
154 points
19 comments1 min readLW link

Does davi­dad’s up­load­ing moon­shot work?

3 Nov 2023 2:21 UTC
145 points
32 comments25 min readLW link

Loudly Give Up, Don’t Quietly Fade

Screwtape13 Nov 2023 23:30 UTC
138 points
11 comments6 min readLW link

Mo­ral Real­ity Check (a short story)

jessicata26 Nov 2023 5:03 UTC
137 points
44 comments21 min readLW link
(unstableontology.com)

EA orgs’ le­gal struc­ture in­hibits risk tak­ing and in­for­ma­tion shar­ing on the margin

Elizabeth5 Nov 2023 19:13 UTC
135 points
17 comments4 min readLW link

In­tegrity in AI Gover­nance and Advocacy

3 Nov 2023 19:52 UTC
134 points
57 comments23 min readLW link

How to (hope­fully eth­i­cally) make money off of AGI

6 Nov 2023 23:35 UTC
127 points
75 comments32 min readLW link

Apoca­lypse in­surance, and the hardline liber­tar­ian take on AI risk

So8res28 Nov 2023 2:09 UTC
122 points
36 comments7 min readLW link

8 ex­am­ples in­form­ing my pes­simism on up­load­ing with­out re­verse engineering

Steven Byrnes3 Nov 2023 20:03 UTC
111 points
12 comments12 min readLW link

Ex­pe­riences and learn­ings from both sides of the AI safety job market

Marius Hobbhahn15 Nov 2023 15:40 UTC
109 points
4 comments18 min readLW link

How much to up­date on re­cent AI gov­er­nance moves?

16 Nov 2023 23:46 UTC
109 points
4 comments29 min readLW link

New LessWrong fea­ture: Dialogue Matching

jacobjacob16 Nov 2023 21:27 UTC
106 points
22 comments3 min readLW link

Stuxnet, not Skynet: Hu­man­ity’s dis­em­pow­er­ment by AI

Roko4 Nov 2023 22:23 UTC
106 points
23 comments6 min readLW link

Pick­ing Men­tors For Re­search Programmes

Raymond D10 Nov 2023 13:01 UTC
105 points
8 comments4 min readLW link

De­cep­tion Chess: Game #1

3 Nov 2023 21:13 UTC
104 points
19 comments8 min readLW link

My techno-op­ti­mism [By Vi­talik Bu­terin]

habryka27 Nov 2023 23:53 UTC
102 points
16 comments2 min readLW link
(www.lesswrong.com)

One Day Sooner

Screwtape2 Nov 2023 19:00 UTC
98 points
5 comments8 min readLW link

On the Ex­ec­u­tive Order

Zvi1 Nov 2023 14:20 UTC
94 points
3 comments30 min readLW link
(thezvi.wordpress.com)

Learn­ing-the­o­retic agenda read­ing list

Vanessa Kosoy9 Nov 2023 17:25 UTC
91 points
0 comments2 min readLW link

Kids or No kids

Kids or no kids14 Nov 2023 18:37 UTC
91 points
10 comments13 min readLW link

The Soul Key

Richard_Ngo4 Nov 2023 17:51 UTC
91 points
9 comments8 min readLW link
(www.narrativeark.xyz)

Public Call for In­ter­est in Math­e­mat­i­cal Alignment

Davidmanheim22 Nov 2023 13:22 UTC
89 points
9 comments1 min readLW link

Large Lan­guage Models can Strate­gi­cally De­ceive their Users when Put Un­der Pres­sure.

ReaderM15 Nov 2023 16:36 UTC
89 points
8 comments2 min readLW link
(arxiv.org)

Growth and Form in a Toy Model of Superposition

8 Nov 2023 11:08 UTC
87 points
5 comments14 min readLW link

Dario Amodei’s pre­pared re­marks from the UK AI Safety Sum­mit, on An­thropic’s Re­spon­si­ble Scal­ing Policy

Zac Hatfield-Dodds1 Nov 2023 18:10 UTC
85 points
1 comment4 min readLW link
(www.anthropic.com)

Coup probes: Catch­ing catas­tro­phes with probes trained off-policy

Fabien Roger17 Nov 2023 17:58 UTC
84 points
7 comments14 min readLW link

Say­ing the quiet part out loud: trad­ing off x-risk for per­sonal immortality

disturbance2 Nov 2023 17:43 UTC
82 points
89 comments5 min readLW link

Bostrom Goes Unheard

Zvi13 Nov 2023 14:11 UTC
81 points
9 comments18 min readLW link

Agent Boundaries Aren’t Markov Blan­kets. [Un­less they’re non-causal; see com­ments.]

abramdemski20 Nov 2023 18:23 UTC
81 points
8 comments2 min readLW link

Un­trusted smart mod­els and trusted dumb models

Buck4 Nov 2023 3:06 UTC
80 points
12 comments6 min readLW link

Self-Refer­en­tial Prob­a­bil­is­tic Logic Ad­mits the Payor’s Lemma

Yudhister Kumar28 Nov 2023 10:27 UTC
80 points
13 comments4 min readLW link

An­nounc­ing Athena—Women in AI Align­ment Research

Claire Short7 Nov 2023 21:46 UTC
80 points
2 comments3 min readLW link

New re­port: “Schem­ing AIs: Will AIs fake al­ign­ment dur­ing train­ing in or­der to get power?”

Joe Carlsmith15 Nov 2023 17:16 UTC
79 points
26 comments30 min readLW link

Thomas Kwa’s re­search journal

23 Nov 2023 5:11 UTC
79 points
1 comment6 min readLW link

Spa­cious­ness In Part­ner Dance: A Nat­u­ral­ism Demo

LoganStrohl19 Nov 2023 7:00 UTC
78 points
5 comments19 min readLW link