RSS

[Question] What are the most plau­si­ble “AI Safety warn­ing shot” sce­nar­ios?

Daniel Kokotajlo
26 Mar 2020 20:59 UTC
32 points
14 comments1 min readLW link

How im­por­tant are MDPs for AGI (Safety)?

michaelcohen
26 Mar 2020 20:32 UTC
13 points
7 comments2 min readLW link

[AN #92]: Learn­ing good rep­re­sen­ta­tions with con­trastive pre­dic­tive coding

rohinmshah
25 Mar 2020 17:20 UTC
19 points
1 comment10 min readLW link
(mailchi.mp)

De­con­fus­ing Hu­man Values Re­search Agenda v1

G Gordon Worley III
23 Mar 2020 16:25 UTC
18 points
11 comments4 min readLW link

[Question] [Meta] Do you want AIS We­bi­nars?

Linda Linsefors
21 Mar 2020 16:01 UTC
17 points
7 comments1 min readLW link

Me­di­a­tion From a Distance

johnswentworth
20 Mar 2020 22:02 UTC
11 points
0 comments2 min readLW link

Think­ing About Filtered Ev­i­dence Is (Very!) Hard

abramdemski
19 Mar 2020 23:20 UTC
73 points
13 comments14 min readLW link

Align­ment as Translation

johnswentworth
19 Mar 2020 21:40 UTC
29 points
23 comments4 min readLW link

Ab­strac­tion = In­for­ma­tion at a Distance

johnswentworth
19 Mar 2020 0:19 UTC
21 points
0 comments3 min readLW link

[AN #91]: Con­cepts, im­ple­men­ta­tions, prob­lems, and a bench­mark for im­pact measurement

rohinmshah
18 Mar 2020 17:10 UTC
16 points
10 comments13 min readLW link
(mailchi.mp)

What is In­ter­pretabil­ity?

17 Mar 2020 20:23 UTC
27 points
0 comments11 min readLW link

AI Align­ment Pod­cast: On Lethal Au­tonomous Weapons with Paul Scharre

Palus Astra
16 Mar 2020 23:00 UTC
11 points
0 comments48 min readLW link

[Question] Pos­i­tive Feed­back → Op­ti­miza­tion?

johnswentworth
16 Mar 2020 18:48 UTC
19 points
6 comments1 min readLW link

[Question] What are some ex­er­cises for build­ing/​gen­er­at­ing in­tu­itions about key dis­agree­ments in AI al­ign­ment?

riceissa
16 Mar 2020 7:41 UTC
8 points
2 comments1 min readLW link

Trace README

johnswentworth
11 Mar 2020 21:08 UTC
31 points
1 comment8 min readLW link

[AN #90]: How search land­scapes can con­tain self-re­in­forc­ing feed­back loops

rohinmshah
11 Mar 2020 17:30 UTC
12 points
6 comments8 min readLW link
(mailchi.mp)

Zoom In: An In­tro­duc­tion to Circuits

evhub
10 Mar 2020 19:36 UTC
64 points
10 comments2 min readLW link
(distill.pub)

Vuln­er­a­bil­ities in CDT and TI-un­aware agents

10 Mar 2020 14:14 UTC
5 points
1 comment4 min readLW link

[Question] Name of Prob­lem?

johnswentworth
9 Mar 2020 20:15 UTC
9 points
29 comments1 min readLW link

Sub­jec­tive im­pli­ca­tion de­ci­sion the­ory in crit­i­cal agentialism

jessicata
5 Mar 2020 23:30 UTC
12 points
16 comments5 min readLW link
(unstableontology.com)