RSS
Page 1

All I know is Goodhart

Stuart_Armstrong
21 Oct 2019 12:12 UTC
11 points
1 comment3 min readLW link

Defin­ing Myopia

abramdemski
19 Oct 2019 21:32 UTC
18 points
5 comments8 min readLW link

[AN #69] Stu­art Rus­sell’s new book on why we need to re­place the stan­dard model of AI

rohinmshah
19 Oct 2019 0:30 UTC
53 points
12 comments15 min readLW link
(mailchi.mp)

Tech­ni­cal AGI safety re­search out­side AI

ricraz
18 Oct 2019 15:00 UTC
33 points
0 comments3 min readLW link

Ran­dom Thoughts on Pre­dict-O-Matic

abramdemski
17 Oct 2019 23:39 UTC
23 points
2 comments9 min readLW link

The Dual­ist Pre­dict-O-Matic ($100 prize)

John_Maxwell
17 Oct 2019 6:45 UTC
17 points
23 comments5 min readLW link

Full toy model for prefer­ence learning

Stuart_Armstrong
16 Oct 2019 11:06 UTC
11 points
0 comments12 min readLW link

Gra­di­ent hacking

evhub
16 Oct 2019 0:53 UTC
47 points
8 comments3 min readLW link

The Parable of Pre­dict-O-Matic

abramdemski
15 Oct 2019 0:49 UTC
117 points
9 comments14 min readLW link

Im­pact mea­sure­ment and value-neu­tral­ity verification

evhub
15 Oct 2019 0:06 UTC
35 points
7 comments6 min readLW link

[AN #68]: The at­tain­able util­ity the­ory of impact

rohinmshah
14 Oct 2019 17:00 UTC
19 points
0 comments8 min readLW link
(mailchi.mp)

AI al­ign­ment landscape

paulfchristiano
13 Oct 2019 2:10 UTC
39 points
1 comment1 min readLW link
(ai-alignment.com)

Thoughts on “Hu­man-Com­pat­i­ble”

TurnTrout
10 Oct 2019 5:24 UTC
53 points
35 comments5 min readLW link

Mis­con­cep­tions about con­tin­u­ous takeoff

Matthew Barnett
8 Oct 2019 21:31 UTC
53 points
27 comments4 min readLW link

Char­ac­ter­iz­ing Real-World Agents as a Re­search Meta-Strategy

johnswentworth
8 Oct 2019 15:32 UTC
24 points
4 comments5 min readLW link

What’s the dream for giv­ing nat­u­ral lan­guage com­mands to AI?

Charlie Steiner
8 Oct 2019 13:42 UTC
9 points
2 comments7 min readLW link

AI Align­ment Writ­ing Day Roundup #2

Ben Pace
7 Oct 2019 23:36 UTC
35 points
2 comments3 min readLW link

Oc­cam’s Ra­zor May Be Suffi­cient to In­fer the Prefer­ences of Ir­ra­tional Agents: A re­ply to Arm­strong & Mindermann

Daniel Kokotajlo
7 Oct 2019 19:52 UTC
47 points
21 comments7 min readLW link

[AN #67]: Creat­ing en­vi­ron­ments in which to study in­ner al­ign­ment failures

rohinmshah
7 Oct 2019 17:10 UTC
17 points
0 comments8 min readLW link
(mailchi.mp)

The Gears of Impact

TurnTrout
7 Oct 2019 14:44 UTC
32 points
1 comment1 min readLW link