Sticky goals: a con­crete ex­per­i­ment for un­der­stand­ing de­cep­tive alignment

evhub2 Sep 2022 21:57 UTC
39 points
13 comments3 min readLW link

Agency en­g­ineer­ing: is AI-al­ign­ment “to hu­man in­tent” enough?

catubc2 Sep 2022 18:14 UTC
9 points
10 comments6 min readLW link

Hanover, Ger­many—ACX Mee­tups Every­where 2022

eikowagenknecht2 Sep 2022 17:31 UTC
2 points
0 comments1 min readLW link

Laz­i­ness in AI

Richard Henage2 Sep 2022 17:04 UTC
13 points
5 comments1 min readLW link

Ex­port­ing Han­gouts History

jefftk2 Sep 2022 15:00 UTC
20 points
0 comments2 min readLW link
(www.jefftk.com)

Simulators

janus2 Sep 2022 12:45 UTC
594 points
161 comments41 min readLW link8 reviews
(generative.ink)

Lev­el­ling Up in AI Safety Re­search Engineering

Gabriel Mukobi2 Sep 2022 4:59 UTC
57 points
9 comments17 min readLW link

Stop Dis­cour­ag­ing Microwave For­mula Preparation

jefftk2 Sep 2022 2:10 UTC
68 points
12 comments2 min readLW link
(www.jefftk.com)

A Richly In­ter­ac­tive AGI Align­ment Chart

lisperati2 Sep 2022 0:44 UTC
14 points
6 comments1 min readLW link

Ap­pendix: How to run a suc­cess­ful Ham­ming circle

CFAR!Duncan2 Sep 2022 0:22 UTC
35 points
6 comments7 min readLW link

Re­place­ment for PONR concept

Daniel Kokotajlo2 Sep 2022 0:09 UTC
58 points
6 comments2 min readLW link

AI co­or­di­na­tion needs clear wins

evhub1 Sep 2022 23:41 UTC
146 points
16 comments2 min readLW link1 review

Short story spec­u­lat­ing on pos­si­ble ram­ifi­ca­tions of AI on the art world

Yitz1 Sep 2022 21:15 UTC
30 points
8 comments3 min readLW link
(archiveofourown.org)

Why was progress so slow in the past?

jasoncrawford1 Sep 2022 20:26 UTC
54 points
31 comments6 min readLW link
(rootsofprogress.org)

AI Safety and Neigh­bor­ing Com­mu­ni­ties: A Quick-Start Guide, as of Sum­mer 2022

Sam Bowman1 Sep 2022 19:15 UTC
76 points
2 comments7 min readLW link

Gra­di­ent Hacker De­sign Prin­ci­ples From Biology

johnswentworth1 Sep 2022 19:03 UTC
60 points
13 comments3 min readLW link

Book re­view: Put Your Ass Where Your Heart Wants to Be

Ruhul1 Sep 2022 18:21 UTC
1 point
2 comments10 min readLW link

A Sur­vey of Foun­da­tional Meth­ods in In­verse Re­in­force­ment Learning

adamk1 Sep 2022 18:21 UTC
19 points
0 comments12 min readLW link

I Tripped and Be­came GPT! (And How This Up­dated My Timelines)

Frankophone1 Sep 2022 17:56 UTC
31 points
0 comments4 min readLW link

[Question] Fixed point the­ory (lo­cally (α,β,ψ) dom­i­nated con­trac­tive con­di­tion)

muzammil1 Sep 2022 17:56 UTC
0 points
3 comments1 min readLW link

Align­ment is hard. Com­mu­ni­cat­ing that, might be harder

Eleni Angelou1 Sep 2022 16:57 UTC
7 points
8 comments3 min readLW link

Covid 9/​1/​22: Meet the New Booster

Zvi1 Sep 2022 14:00 UTC
41 points
6 comments14 min readLW link
(thezvi.wordpress.com)

A Starter-kit for Ra­tion­al­ity Space

Jesse Hoogland1 Sep 2022 13:04 UTC
41 points
0 comments1 min readLW link
(github.com)

Pon­der­ing the paucity of vol­canic pro­fan­ity post Pom­peii perusal

CraigMichael1 Sep 2022 9:29 UTC
21 points
2 comments15 min readLW link

In­fra-Ex­er­cises, Part 1

1 Sep 2022 5:06 UTC
56 points
10 comments1 min readLW link

Strat­egy For Con­di­tion­ing Gen­er­a­tive Models

1 Sep 2022 4:34 UTC
31 points
4 comments18 min readLW link

Safety Com­mit­tee Resources

jefftk1 Sep 2022 2:30 UTC
22 points
2 comments1 min readLW link
(www.jefftk.com)

Progress links and tweets, 2022-08-31

jasoncrawford31 Aug 2022 21:54 UTC
13 points
4 comments1 min readLW link
(rootsofprogress.org)

Enantiodromia

ChristianKl31 Aug 2022 21:13 UTC
38 points
7 comments3 min readLW link

[Question] Sup­pos­ing Europe is headed for a se­ri­ous en­ergy crisis this win­ter, what can/​should one do as an in­di­vi­d­ual to pre­pare?

Erich_Grunewald31 Aug 2022 19:28 UTC
18 points
13 comments1 min readLW link

New 80,000 Hours prob­lem pro­file on ex­is­ten­tial risks from AI

Benjamin Hilton31 Aug 2022 17:36 UTC
28 points
6 comments7 min readLW link
(80000hours.org)

Grand Theft Education

Zvi31 Aug 2022 11:50 UTC
66 points
18 comments20 min readLW link
(thezvi.wordpress.com)

How much im­pact can any one man have?

GregorDeVillain31 Aug 2022 10:26 UTC
9 points
3 comments4 min readLW link

[Question] How might we make bet­ter use of AI ca­pa­bil­ities re­search for al­ign­ment pur­poses?

ghostwheel31 Aug 2022 4:19 UTC
11 points
4 comments1 min readLW link

[Question] AI Box Ex­per­i­ment: Are peo­ple still in­ter­ested?

Double31 Aug 2022 3:04 UTC
30 points
13 comments1 min readLW link

OC ACX/​LW in New­port Beach

Michael Michalchik31 Aug 2022 2:56 UTC
1 point
1 comment1 min readLW link

Sur­vey of NLP Re­searchers: NLP is con­tribut­ing to AGI progress; ma­jor catas­tro­phe plausible

Sam Bowman31 Aug 2022 1:39 UTC
92 points
6 comments2 min readLW link

And the word was “God”

pchvykov30 Aug 2022 21:13 UTC
−22 points
4 comments3 min readLW link

Wor­lds Where Iter­a­tive De­sign Fails

johnswentworth30 Aug 2022 20:48 UTC
190 points
30 comments10 min readLW link1 review

In­ner Align­ment via Superpowers

30 Aug 2022 20:01 UTC
37 points
13 comments4 min readLW link

ML Model At­tri­bu­tion Challenge [Linkpost]

aogara30 Aug 2022 19:34 UTC
11 points
0 comments1 min readLW link
(mlmac.io)

How likely is de­cep­tive al­ign­ment?

evhub30 Aug 2022 19:34 UTC
103 points
28 comments60 min readLW link

Built-In Bundling For Faster Loading

jefftk30 Aug 2022 19:20 UTC
15 points
0 comments2 min readLW link
(www.jefftk.com)

[Question] A bayesian up­dat­ing on ex­pert opinions

amarai30 Aug 2022 11:56 UTC
1 point
1 comment1 min readLW link

Any Utili­tar­i­anism Makes Sense As Policy

George3d630 Aug 2022 9:55 UTC
6 points
6 comments7 min readLW link
(www.epistem.ink)

A gen­tle primer on car­ing, in­clud­ing in strange senses, with applications

Kaarel30 Aug 2022 8:05 UTC
9 points
4 comments18 min readLW link

Mod­ified Guess Culture

konstell30 Aug 2022 2:30 UTC
5 points
5 comments1 min readLW link
(konstell.com)

[Question] What is the best cri­tique of AI ex­is­ten­tial risk ar­gu­ments?

joshc30 Aug 2022 2:18 UTC
6 points
11 comments1 min readLW link

How to plan for a rad­i­cally un­cer­tain fu­ture?

Kerry30 Aug 2022 2:14 UTC
57 points
35 comments1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (21 Aug − 27 Aug 22′)

Zoe Williams30 Aug 2022 1:42 UTC
57 points
4 comments12 min readLW link