RSS
Page 1

Prob­lems in AI Align­ment that philoso­phers could po­ten­tially con­tribute to

Wei_Dai
17 Aug 2019 17:38 UTC
53 points
8 comments2 min readLW link

Clar­ify­ing some key hy­pothe­ses in AI alignment

Ben Cottier
15 Aug 2019 21:29 UTC
63 points
2 comments9 min readLW link

Dis­tance Func­tions are Hard

Grue_Slinky
13 Aug 2019 17:33 UTC
40 points
13 comments6 min readLW link

Mesa-Op­ti­miz­ers and Over-op­ti­miza­tion Failure (Op­ti­miz­ing and Good­hart Effects, Clar­ify­ing Thoughts—Part 4)

Davidmanheim
12 Aug 2019 8:07 UTC
11 points
3 comments4 min readLW link

Ver­ifi­ca­tion and Transparency

DanielFilan
8 Aug 2019 1:50 UTC
35 points
4 comments2 min readLW link
(danielfilan.com)

Toy model piece #2: Com­bin­ing short and long range par­tial preferences

Stuart_Armstrong
8 Aug 2019 0:11 UTC
15 points
0 comments4 min readLW link

Four Ways An Im­pact Mea­sure Could Help Alignment

Matthew Barnett
8 Aug 2019 0:10 UTC
18 points
1 comment8 min readLW link

Un­der­stand­ing Re­cent Im­pact Measures

Matthew Barnett
7 Aug 2019 4:57 UTC
17 points
6 comments7 min readLW link

Pro­ject Pro­posal: Con­sid­er­a­tions for trad­ing off ca­pa­bil­ities and safety im­pacts of AI research

capybaralet
6 Aug 2019 22:22 UTC
30 points
11 comments2 min readLW link

New pa­per: Cor­rigi­bil­ity with Utility Preservation

Koen.Holtman
6 Aug 2019 19:04 UTC
36 points
11 comments2 min readLW link

A Sur­vey of Early Im­pact Measures

Matthew Barnett
6 Aug 2019 1:22 UTC
21 points
0 comments8 min readLW link

Prefer­ences as an (in­stinc­tive) stance

Stuart_Armstrong
6 Aug 2019 0:43 UTC
20 points
4 comments4 min readLW link

AI Align­ment Open Thread Au­gust 2019

habryka
4 Aug 2019 22:09 UTC
37 points
85 comments1 min readLW link

Prac­ti­cal con­se­quences of im­pos­si­bil­ity of value learning

Stuart_Armstrong
2 Aug 2019 23:06 UTC
23 points
13 comments3 min readLW link

Very differ­ent, very ad­e­quate outcomes

Stuart_Armstrong
2 Aug 2019 20:31 UTC
12 points
10 comments1 min readLW link

Why Subagents?

johnswentworth
1 Aug 2019 22:17 UTC
79 points
10 comments7 min readLW link

Toy model piece #1: Par­tial prefer­ences revisited

Stuart_Armstrong
29 Jul 2019 16:35 UTC
12 points
18 comments5 min readLW link

Ap­ply­ing Overop­ti­miza­tion to Selec­tion vs. Con­trol (Op­ti­miz­ing and Good­hart Effects—Clar­ify­ing Thoughts, Part 3)

Davidmanheim
28 Jul 2019 9:32 UTC
19 points
4 comments3 min readLW link

What does Op­ti­miza­tion Mean, Again? (Op­ti­miz­ing and Good­hart Effects—Clar­ify­ing Thoughts, Part 2)

Davidmanheim
28 Jul 2019 9:30 UTC
29 points
7 comments4 min readLW link

The Ar­tifi­cial In­ten­tional Stance

Charlie Steiner
27 Jul 2019 7:00 UTC
12 points
0 comments4 min readLW link