ArchiveSequencesAbout
QuestionsEventsShortformAlignment ForumAF Comments
HomeFeaturedAllTagsRecent Comments
RSS
NewHotActiveOld
Page 1

Lan­guage mod­els seem to be much bet­ter than hu­mans at next-to­ken prediction

Buck, Fabien and LawrenceC
11 Aug 2022 17:45 UTC
128 points
52 comments13 min readLW link

«Boundaries», Part 1: a key miss­ing con­cept from util­ity theory

Andrew_Critch26 Jul 2022 23:03 UTC
113 points
16 comments7 min readLW link

Chang­ing the world through slack & hobbies

Steven Byrnes21 Jul 2022 18:11 UTC
228 points
14 comments10 min readLW link

What should you change in re­sponse to an “emer­gency”? And AI risk

AnnaSalamon18 Jul 2022 1:11 UTC
290 points
59 comments6 min readLW link

Hu­mans provide an un­tapped wealth of ev­i­dence about alignment

TurnTrout and Quintin Pope
14 Jul 2022 2:31 UTC
168 points
92 comments10 min readLW link

On how var­i­ous plans miss the hard bits of the al­ign­ment challenge

So8res12 Jul 2022 2:49 UTC
248 points
81 comments29 min readLW link

ITT-pass­ing and ci­vil­ity are good; “char­ity” is bad; steel­man­ning is niche

Rob Bensinger5 Jul 2022 0:15 UTC
137 points
32 comments6 min readLW link

Look­ing back on my al­ign­ment PhD

TurnTrout1 Jul 2022 3:19 UTC
283 points
58 comments11 min readLW link

It’s Prob­a­bly Not Lithium

Natália Coelho Mendonça28 Jun 2022 21:24 UTC
412 points
179 comments28 min readLW link

What Are You Track­ing In Your Head?

johnswentworth28 Jun 2022 19:30 UTC
215 points
73 comments4 min readLW link

Non­profit Boards are Weird

HoldenKarnofsky23 Jun 2022 14:40 UTC
149 points
25 comments20 min readLW link
(www.cold-takes.com)

Se­cu­rity Mind­set: Les­sons from 20+ years of Soft­ware Se­cu­rity Failures Rele­vant to AGI Alignment

elspood21 Jun 2022 23:55 UTC
312 points
40 comments7 min readLW link

Where I agree and dis­agree with Eliezer

paulfchristiano19 Jun 2022 19:15 UTC
738 points
202 comments20 min readLW link

Hu­mans are very re­li­able agents

alyssavance16 Jun 2022 22:02 UTC
249 points
35 comments3 min readLW link

AGI Ruin: A List of Lethalities

Eliezer Yudkowsky5 Jun 2022 22:05 UTC
681 points
638 comments30 min readLW link

Public be­liefs vs. Pri­vate beliefs

Eli Tyre1 Jun 2022 21:33 UTC
132 points
25 comments5 min readLW link

Six Di­men­sions of Oper­a­tional Ad­e­quacy in AGI Projects

Eliezer Yudkowsky30 May 2022 17:00 UTC
263 points
65 comments13 min readLW link

Benign Boundary Violations

Duncan_Sabien26 May 2022 6:48 UTC
197 points
85 comments18 min readLW link

Visi­ble Home­less­ness in SF: A Quick Break­down of Causes

alyssavance25 May 2022 1:40 UTC
194 points
40 comments2 min readLW link

[In­tro to brain-like-AGI safety] 15. Con­clu­sion: Open prob­lems, how to help, AMA

Steven Byrnes17 May 2022 15:11 UTC
81 points
11 comments14 min readLW link
Back to topNext