RSS
Page 1

Re­search Agenda in re­verse: what *would* a solu­tion look like?

Stuart_Armstrong
25 Jun 2019 13:52 UTC
35 points
10 comments1 min readLW link

Ma­chine Learn­ing Pro­jects on IDA

Owain_Evans
24 Jun 2019 18:38 UTC
39 points
2 comments2 min readLW link

[AN #58] Mesa op­ti­miza­tion: what it is, and why we should care

rohinmshah
24 Jun 2019 16:10 UTC
45 points
8 comments8 min readLW link
(mailchi.mp)

Model­ing AGI Safety Frame­works with Causal In­fluence Diagrams

xrchz
21 Jun 2019 12:50 UTC
40 points
3 comments1 min readLW link
(arxiv.org)

Re­search Agenda v0.9: Syn­the­sis­ing a hu­man’s prefer­ences into a util­ity function

Stuart_Armstrong
17 Jun 2019 17:46 UTC
49 points
10 comments32 min readLW link

Prefer­ence con­di­tional on cir­cum­stances and past prefer­ence satisfaction

Stuart_Armstrong
17 Jun 2019 15:30 UTC
11 points
1 comment1 min readLW link

Let’s talk about “Con­ver­gent Ra­tion­al­ity”

capybaralet
12 Jun 2019 21:53 UTC
21 points
12 comments6 min readLW link

Prob­lems with Coun­ter­fac­tual Oracles

Michaël Trazzi
11 Jun 2019 18:10 UTC
6 points
11 comments3 min readLW link

AGI will dras­ti­cally in­crease economies of scale

Wei_Dai
7 Jun 2019 23:17 UTC
44 points
19 comments2 min readLW link

Risks from Learned Op­ti­miza­tion: Con­clu­sion and Re­lated Work

evhub
7 Jun 2019 19:53 UTC
52 points
0 comments6 min readLW link

For the past, in some ways only, we are moral degenerates

Stuart_Armstrong
7 Jun 2019 15:57 UTC
29 points
14 comments2 min readLW link

[AN #57] Why we should fo­cus on ro­bust­ness in AI safety, and the analo­gous prob­lems in programming

rohinmshah
5 Jun 2019 23:20 UTC
26 points
14 comments7 min readLW link
(mailchi.mp)

De­cep­tive Alignment

evhub
5 Jun 2019 20:16 UTC
55 points
4 comments17 min readLW link

The In­ner Align­ment Problem

evhub
4 Jun 2019 1:20 UTC
60 points
13 comments13 min readLW link

To first or­der, moral re­al­ism and moral anti-re­al­ism are the same thing

Stuart_Armstrong
3 Jun 2019 15:04 UTC
17 points
5 comments3 min readLW link

Con­di­tional meta-preferences

Stuart_Armstrong
3 Jun 2019 14:09 UTC
6 points
0 comments1 min readLW link

Does Bayes Beat Good­hart?

abramdemski
3 Jun 2019 2:31 UTC
36 points
26 comments7 min readLW link

Selec­tion vs Control

abramdemski
2 Jun 2019 7:01 UTC
91 points
14 comments11 min readLW link

Con­di­tions for Mesa-Optimization

evhub
1 Jun 2019 20:52 UTC
48 points
27 comments12 min readLW link

Risks from Learned Op­ti­miza­tion: Introduction

evhub
31 May 2019 23:44 UTC
101 points
27 comments12 min readLW link