How safe “safe” AI de­vel­op­ment?

Gordon Seidoh Worley28 Feb 2018 23:21 UTC
9 points
1 comment1 min readLW link

Beyond al­gorith­mic equiv­alence: self-modelling

Stuart_Armstrong28 Feb 2018 16:55 UTC
10 points
3 comments1 min readLW link

Beyond al­gorith­mic equiv­alence: al­gorith­mic noise

Stuart_Armstrong28 Feb 2018 16:55 UTC
10 points
4 comments2 min readLW link

Us­ing the uni­ver­sal prior for log­i­cal un­cer­tainty (re­tracted)

cousin_it28 Feb 2018 13:07 UTC
15 points
13 comments2 min readLW link

2/​27/​08 Up­date – Front­page 3.0

Raemon28 Feb 2018 6:26 UTC
15 points
21 comments1 min readLW link

TDT for Humans

alkjash28 Feb 2018 5:40 UTC
26 points
7 comments5 min readLW link
(radimentary.wordpress.com)

Set Up for Suc­cess: In­sights from ‘Naïve Set The­ory’

TurnTrout28 Feb 2018 2:01 UTC
30 points
40 comments3 min readLW link

In­tu­ition should be ap­plied at the low­est pos­si­ble level

Rafael Harth27 Feb 2018 22:58 UTC
10 points
9 comments1 min readLW link

The sad state of Ra­tion­al­ity Zürich—Effec­tive Altru­ism Zürich included

roland27 Feb 2018 14:51 UTC
−8 points
50 comments3 min readLW link

The worst trol­ley prob­lem in the world

CronoDAS27 Feb 2018 3:56 UTC
1 point
1 comment1 min readLW link

Cat­e­gories of Sacredness

Zvi27 Feb 2018 2:00 UTC
21 points
35 comments8 min readLW link
(thezvi.wordpress.com)

More on the Lin­ear Utility Hy­poth­e­sis and the Lev­er­age Prior

AlexMennen26 Feb 2018 23:53 UTC
16 points
4 comments9 min readLW link

Goal Factoring

alkjash26 Feb 2018 23:30 UTC
27 points
4 comments2 min readLW link
(radimentary.wordpress.com)

In­con­ve­nience Is Qual­i­ta­tively Bad

Alicorn26 Feb 2018 23:27 UTC
82 points
52 comments2 min readLW link

The Ham­ming Prob­lem of Group Rationality

PDV26 Feb 2018 18:59 UTC
6 points
36 comments1 min readLW link

Focusing

alkjash26 Feb 2018 6:10 UTC
20 points
21 comments3 min readLW link
(radimentary.wordpress.com)

Map­ping the Archipelago

alkjash26 Feb 2018 5:09 UTC
14 points
24 comments1 min readLW link

Ex­per­i­men­tal Open Threads

Chris_Leong26 Feb 2018 3:13 UTC
22 points
5 comments1 min readLW link

Walk­through of ‘For­mal­iz­ing Con­ver­gent In­stru­men­tal Goals’

TurnTrout26 Feb 2018 2:20 UTC
10 points
2 comments10 min readLW link

Will AI See Sud­den Progress?

KatjaGrace26 Feb 2018 0:41 UTC
27 points
11 comments1 min readLW link1 review

Self-reg­u­la­tion of safety in AI research

Gordon Seidoh Worley25 Feb 2018 23:17 UTC
12 points
6 comments2 min readLW link

The abrupt­ness of nu­clear weapons

paulfchristiano25 Feb 2018 17:40 UTC
47 points
35 comments2 min readLW link

Like­li­hood of dis­con­tin­u­ous progress around the de­vel­op­ment of AGI

vedevazz25 Feb 2018 15:13 UTC
4 points
2 comments1 min readLW link
(aiimpacts.org)

Open-Source Monasticism

Nathan Rosquist25 Feb 2018 13:52 UTC
25 points
7 comments4 min readLW link

Pass­ing Troll Bridge

Diffractor25 Feb 2018 8:21 UTC
11 points
2 comments10 min readLW link

Three Miniatures

alkjash25 Feb 2018 5:40 UTC
22 points
11 comments3 min readLW link
(radimentary.wordpress.com)

Ar­gu­ments about fast takeoff

paulfchristiano25 Feb 2018 4:53 UTC
89 points
65 comments2 min readLW link1 review
(sideways-view.com)

Meta-tations on Moder­a­tion: Towards Public Archipelago

Raemon25 Feb 2018 3:59 UTC
78 points
176 comments23 min readLW link

Les­sons from the Cold War on In­for­ma­tion Hazards: Why In­ter­nal Com­mu­ni­ca­tion is Critical

Gentzel24 Feb 2018 23:34 UTC
47 points
10 comments4 min readLW link

What we talk about when we talk about max­imis­ing utility

Richard_Ngo24 Feb 2018 22:33 UTC
14 points
18 comments4 min readLW link

Links with underscores

ShardPhoenix24 Feb 2018 11:32 UTC
2 points
3 comments1 min readLW link

A use­ful level distinction

Charlie Steiner24 Feb 2018 6:39 UTC
8 points
4 comments2 min readLW link

CoZE 2

alkjash24 Feb 2018 5:40 UTC
16 points
7 comments2 min readLW link
(radimentary.wordpress.com)

On Build­ing The­o­ries of History

Samo Burja23 Feb 2018 23:40 UTC
29 points
20 comments5 min readLW link

Mythic Mode

Valentine23 Feb 2018 22:45 UTC
65 points
81 comments9 min readLW link

The Mal­i­cious Use of Ar­tifi­cial In­tel­li­gence: Fore­cast­ing, Preven­tion, and Mitigation

Gordon Seidoh Worley23 Feb 2018 21:42 UTC
5 points
8 comments1 min readLW link
(arxiv.org)

Two types of mathematician

drossbucket23 Feb 2018 19:26 UTC
62 points
41 comments4 min readLW link

June 2012: 0/​33 Tur­ing Award win­ners pre­dict com­put­ers beat­ing hu­mans at go within next 10 years.

betterthanwell23 Feb 2018 11:25 UTC
18 points
13 comments2 min readLW link

De­sign 2

alkjash23 Feb 2018 6:20 UTC
18 points
17 comments3 min readLW link
(radimentary.wordpress.com)

AI Align­ment and Phenom­e­nal Consciousness

Gordon Seidoh Worley23 Feb 2018 1:21 UTC
9 points
0 comments6 min readLW link
(mapandterritory.org)

Ex­pla­na­tion vs Rationalization

abramdemski22 Feb 2018 23:46 UTC
16 points
11 comments4 min readLW link

The map has gears. They don’t always turn.

abramdemski22 Feb 2018 20:16 UTC
23 points
0 comments1 min readLW link

The In­tel­li­gent So­cial Web

Valentine22 Feb 2018 18:55 UTC
224 points
112 comments12 min readLW link2 reviews

The Three Stages Of Model Development

katerinjo22 Feb 2018 14:33 UTC
17 points
7 comments2 min readLW link

Pain, fear, sex, and higher or­der preferences

Stuart_Armstrong22 Feb 2018 11:30 UTC
5 points
3 comments1 min readLW link

TAPs 2

alkjash22 Feb 2018 5:10 UTC
25 points
5 comments3 min readLW link
(radimentary.wordpress.com)

Ro­bust­ness to Scale

Scott Garrabrant21 Feb 2018 22:55 UTC
128 points
23 comments2 min readLW link1 review

Don’t Con­di­tion on no Catastrophes

Scott Garrabrant21 Feb 2018 21:50 UTC
32 points
7 comments2 min readLW link

The Logic of Science: 2.2

mpr21 Feb 2018 17:28 UTC
9 points
3 comments1 min readLW link
(pulsarcoffee.com)

Yoda Timers 2

alkjash21 Feb 2018 7:40 UTC
28 points
26 comments3 min readLW link
(radimentary.wordpress.com)