RSS
Page 1

Cor­rigi­bil­ity as Con­strained Optimisation

Henrik Åslund
11 Apr 2019 20:09 UTC
13 points
3 comments5 min readLW link

[Question] Best rea­sons for pes­simism about im­pact of im­pact mea­sures?

TurnTrout
10 Apr 2019 17:22 UTC
75 points
37 comments3 min readLW link

Align­ment Newslet­ter One Year Retrospective

rohinmshah
10 Apr 2019 6:58 UTC
92 points
30 comments21 min readLW link

Value Learn­ing is only Asymp­tot­i­cally Safe

michaelcohen
8 Apr 2019 9:45 UTC
7 points
19 comments1 min readLW link

Re­in­force­ment learn­ing with im­per­cep­ti­ble rewards

Vanessa Kosoy
7 Apr 2019 10:27 UTC
18 points
1 comment29 min readLW link

Align­ment Newslet­ter #52

rohinmshah
6 Apr 2019 1:20 UTC
19 points
1 comment8 min readLW link

Defeat­ing Good­hart and the “clos­est un­blocked strat­egy” problem

Stuart_Armstrong
3 Apr 2019 14:46 UTC
22 points
12 comments6 min readLW link

Align­ment Newslet­ter #51

rohinmshah
3 Apr 2019 4:10 UTC
28 points
2 comments15 min readLW link

Learn­ing “known” in­for­ma­tion when the in­for­ma­tion is not ac­tu­ally known

Stuart_Armstrong
1 Apr 2019 17:56 UTC
13 points
0 comments1 min readLW link

Rel­a­tive ex­change rate be­tween preferences

Stuart_Armstrong
29 Mar 2019 11:46 UTC
12 points
1 comment1 min readLW link

Be­ing wrong in ethics

Stuart_Armstrong
29 Mar 2019 11:28 UTC
22 points
0 comments3 min readLW link

Models of prefer­ences in dis­tant situations

Stuart_Armstrong
29 Mar 2019 10:42 UTC
11 points
0 comments2 min readLW link

Align­ment Newslet­ter #50

rohinmshah
28 Mar 2019 18:10 UTC
16 points
2 comments10 min readLW link

The low cost of hu­man prefer­ence incoherence

Stuart_Armstrong
27 Mar 2019 11:58 UTC
19 points
5 comments2 min readLW link

Un­solved re­search prob­lems vs. real-world threat models

catherio
26 Mar 2019 22:10 UTC
19 points
2 comments1 min readLW link
(medium.com)

A Con­crete Pro­posal for Ad­ver­sar­ial IDA

evhub
26 Mar 2019 19:50 UTC
16 points
5 comments5 min readLW link

“Mo­ral” as a prefer­ence label

Stuart_Armstrong
26 Mar 2019 10:30 UTC
14 points
1 comment1 min readLW link

The Game The­ory of Blackmail

Linda Linsefors
22 Mar 2019 17:44 UTC
21 points
13 comments4 min readLW link

The Main Sources of AI Risk?

Wei_Dai
21 Mar 2019 18:28 UTC
59 points
15 comments2 min readLW link

[Question] What’s wrong with these analo­gies for un­der­stand­ing In­formed Over­sight and IDA?

Wei_Dai
20 Mar 2019 9:11 UTC
37 points
3 comments1 min readLW link