RSS

Su­per­ra­tional Agents Kelly Bet In­fluence!

abramdemski16 Apr 2021 22:08 UTC
34 points
4 comments5 min readLW link

Com­put­ing Nat­u­ral Ab­strac­tions: Lin­ear Approximation

johnswentworth15 Apr 2021 17:47 UTC
33 points
22 comments7 min readLW link

[AN #146]: Plau­si­ble sto­ries of how we might fail to avert an ex­is­ten­tial catastrophe

rohinmshah14 Apr 2021 17:30 UTC
15 points
1 comment8 min readLW link
(mailchi.mp)

In­ter­mit­tent Distil­la­tions #2

Mark Xu14 Apr 2021 6:47 UTC
23 points
4 comments9 min readLW link

Iden­ti­fi­a­bil­ity Prob­lem for Su­per­ra­tional De­ci­sion Theories

Bunthut9 Apr 2021 20:33 UTC
17 points
13 comments2 min readLW link

Opinions on In­ter­pretable Ma­chine Learn­ing and 70 Sum­maries of Re­cent Papers

9 Apr 2021 19:19 UTC
109 points
10 comments102 min readLW link

My Cur­rent Take on Counterfactuals

abramdemski9 Apr 2021 17:51 UTC
49 points
13 comments24 min readLW link

[AN #145]: Our three year an­niver­sary!

rohinmshah9 Apr 2021 17:48 UTC
19 points
0 comments8 min readLW link
(mailchi.mp)

Why un­rig­gable *al­most* im­plies uninfluenceable

Stuart_Armstrong9 Apr 2021 17:07 UTC
11 points
0 comments4 min readLW link

AXRP Epi­sode 6 - De­bate and Imi­ta­tive Gen­er­al­iza­tion with Beth Barnes

DanielFilan8 Apr 2021 21:20 UTC
23 points
3 comments59 min readLW link

A pos­si­ble prefer­ence algorithm

Stuart_Armstrong8 Apr 2021 18:25 UTC
22 points
0 comments4 min readLW link

If you don’t de­sign for ex­trap­o­la­tion, you’ll ex­trap­o­late poorly—pos­si­bly fatally

Stuart_Armstrong8 Apr 2021 18:10 UTC
17 points
0 comments4 min readLW link

Solv­ing the whole AGI con­trol prob­lem, ver­sion 0.0001

Steven Byrnes8 Apr 2021 15:14 UTC
41 points
4 comments26 min readLW link

Another (outer) al­ign­ment failure story

paulfchristiano7 Apr 2021 20:12 UTC
127 points
19 comments12 min readLW link

Which coun­ter­fac­tu­als should an AI fol­low?

Stuart_Armstrong7 Apr 2021 16:47 UTC
19 points
5 comments7 min readLW link

Align­ment Newslet­ter Three Year Retrospective

rohinmshah7 Apr 2021 14:39 UTC
54 points
0 comments5 min readLW link

Test­ing The Nat­u­ral Ab­strac­tion Hy­poth­e­sis: Pro­ject Intro

johnswentworth6 Apr 2021 21:24 UTC
106 points
15 comments6 min readLW link

Reflec­tive Bayesianism

abramdemski6 Apr 2021 19:48 UTC
48 points
27 comments13 min readLW link

The Many Faces of In­fra-Beliefs

Diffractor6 Apr 2021 10:43 UTC
15 points
0 comments63 min readLW link

[Question] How do scal­ing laws work for fine-tun­ing?

Daniel Kokotajlo4 Apr 2021 12:18 UTC
24 points
10 comments1 min readLW link