Ad­ver­sar­ial train­ing, im­por­tance sam­pling, and anti-ad­ver­sar­ial train­ing for AI whistleblowing

Buck2 Jun 2022 23:48 UTC
42 points
0 comments3 min readLW link

The pro­to­typ­i­cal catas­trophic AI ac­tion is get­ting root ac­cess to its datacenter

Buck2 Jun 2022 23:46 UTC
181 points
13 comments2 min readLW link1 review

Con­fused why a “ca­pa­bil­ities re­search is good for al­ign­ment progress” po­si­tion isn’t dis­cussed more

Kaj_Sotala2 Jun 2022 21:41 UTC
131 points
27 comments4 min readLW link

An­nounc­ing a con­test: EA Crit­i­cism and Red Teaming

fin2 Jun 2022 20:27 UTC
17 points
1 comment14 min readLW link
(forum.effectivealtruism.org)

Fact post: pro­ject-based learning

dominicq2 Jun 2022 20:18 UTC
12 points
4 comments3 min readLW link

The case for us­ing the term ‘steel­man­ning’ in­stead of ‘prin­ci­ple of char­ity’

ChristianKl2 Jun 2022 19:24 UTC
26 points
7 comments3 min readLW link

Covid 6/​2/​22: De­clin­ing to Respond

Zvi2 Jun 2022 13:50 UTC
55 points
10 comments7 min readLW link
(thezvi.wordpress.com)

The hor­ror of what must, yet can­not, be true

Kaj_Sotala2 Jun 2022 10:20 UTC
52 points
18 comments2 min readLW link
(kajsotala.fi)

Paradigms of AI al­ign­ment: com­po­nents and enablers

Vika2 Jun 2022 6:19 UTC
53 points
4 comments8 min readLW link

The Bio An­chors Forecast

Ansh Radhakrishnan2 Jun 2022 1:32 UTC
13 points
0 comments3 min readLW link

**Venue Changed** ACX Mon­treal Meetup Jun 18 2022

E2 Jun 2022 0:43 UTC
10 points
0 comments1 min readLW link

Public be­liefs vs. Pri­vate beliefs

Eli Tyre1 Jun 2022 21:33 UTC
146 points
30 comments5 min readLW link

[Question] Prob­a­bil­ity that the Pres­i­dent would win elec­tion against a ran­dom adult cit­i­zen?

Daniel Kokotajlo1 Jun 2022 20:38 UTC
15 points
26 comments1 min readLW link

Re­vis­it­ing “Why Global Poverty”

jefftk1 Jun 2022 20:20 UTC
20 points
2 comments3 min readLW link
(www.jefftk.com)

[Question] What will hap­pen when an all-reach­ing AGI starts at­tempt­ing to fix hu­man char­ac­ter flaws?

Michael Bright1 Jun 2022 18:45 UTC
1 point
6 comments1 min readLW link

[Question] Any prior work on mu­ti­a­gent dy­nam­ics for con­tin­u­ous dis­tri­bu­tions over agents?

Quintin Pope1 Jun 2022 18:12 UTC
15 points
2 comments1 min readLW link

[Question] For­ma­tion via nu­cle­ation of boltz­mann brains

Zeruel0171 Jun 2022 18:05 UTC
0 points
9 comments1 min readLW link

Hal­i­fax Ra­tion­al­ity /​ EA Cowork­ing Day

1 Jun 2022 17:47 UTC
9 points
0 comments1 min readLW link

Machines vs Memes Part 3: Imi­ta­tion and Memes

ceru231 Jun 2022 13:36 UTC
7 points
0 comments7 min readLW link

Ra­tion­al­ism in an Age of Egregores

David Udell1 Jun 2022 7:29 UTC
14 points
11 comments2 min readLW link

Wield­ing civilization

dominicq1 Jun 2022 7:11 UTC
29 points
2 comments2 min readLW link

Machines vs. Memes 2: Memet­i­cally-Mo­ti­vated Model Extensions

naterush31 May 2022 22:03 UTC
6 points
0 comments4 min readLW link

Machines vs Memes Part 1: AI Align­ment and Memetics

Harriet Farlow31 May 2022 22:03 UTC
19 points
1 comment6 min readLW link

The Hard In­tel­li­gence Hy­poth­e­sis and Its Bear­ing on Suc­ces­sion In­duced Foom

DragonGod31 May 2022 19:04 UTC
10 points
7 comments4 min readLW link

Paper: Teach­ing GPT3 to ex­press un­cer­tainty in words

Owain_Evans31 May 2022 13:27 UTC
97 points
7 comments4 min readLW link

Effec­tive Altru­ism Vir­tual Pro­grams Jul-Aug 2022

Yve Nichols-Evans31 May 2022 12:56 UTC
1 point
0 comments1 min readLW link

[Question] What is the state of Chi­nese AI re­search?

Ratios31 May 2022 10:05 UTC
34 points
16 comments1 min readLW link

The Brain That Builds Itself

Jan31 May 2022 9:42 UTC
57 points
6 comments8 min readLW link
(universalprior.substack.com)

[Question] Is there any for­mal ar­gu­ment that cli­mate change needs to more ex­treme weather events?

ChristianKl31 May 2022 9:01 UTC
8 points
8 comments1 min readLW link

Progress links and tweets, 2022-05-30

jasoncrawford30 May 2022 23:20 UTC
18 points
0 comments1 min readLW link
(rootsofprogress.org)

The Re­v­erse Basilisk

Dunning K.30 May 2022 23:10 UTC
17 points
23 comments2 min readLW link

De­liber­ate Grieving

Raemon30 May 2022 20:49 UTC
191 points
16 comments9 min readLW link2 reviews

Perform Tractable Re­search While Avoid­ing Ca­pa­bil­ities Ex­ter­nal­ities [Prag­matic AI Safety #4]

30 May 2022 20:25 UTC
51 points
3 comments25 min readLW link

[Question] A ter­rify­ing var­i­ant of Boltz­mann’s brains problem

Zeruel01730 May 2022 20:08 UTC
5 points
12 comments4 min readLW link

Ceiling Air Purifier

jefftk30 May 2022 19:20 UTC
88 points
11 comments2 min readLW link
(www.jefftk.com)

No­tion tem­plate for per­sonal predictions

Arjun Yadav30 May 2022 17:47 UTC
1 point
0 comments1 min readLW link

Six Di­men­sions of Oper­a­tional Ad­e­quacy in AGI Projects

Eliezer Yudkowsky30 May 2022 17:00 UTC
317 points
66 comments13 min readLW link1 review

My SERI MATS Application

Daniel Paleka30 May 2022 2:04 UTC
16 points
0 comments8 min readLW link

Re­shap­ing the AI Industry

Thane Ruthenis29 May 2022 22:54 UTC
148 points
35 comments21 min readLW link

The Un­bear­able Light­ness of Web Vulnerabilities

aiiixiii29 May 2022 21:13 UTC
29 points
2 comments1 min readLW link
(www.theoreticalstructures.io)

Find­ing the Right Problem

tobot29 May 2022 17:52 UTC
8 points
0 comments2 min readLW link

The im­pact you might have work­ing on AI safety

Fabien Roger29 May 2022 16:31 UTC
5 points
1 comment4 min readLW link

The Prob­lem With The Cur­rent State of AGI Definitions

Yitz29 May 2022 13:58 UTC
40 points
22 comments8 min readLW link

[Question] Re­quest for nice ques­tions to think about while try­ing to sleep

oh5432129 May 2022 13:47 UTC
9 points
2 comments1 min readLW link

Will work­ing here ad­vance AGI? Help us not de­stroy the world!

Yonatan Cale29 May 2022 11:42 UTC
30 points
46 comments1 min readLW link

Pass­able Puppet

burmesetheater29 May 2022 11:07 UTC
6 points
1 comment3 min readLW link

Mul­ti­ple AIs in boxes, eval­u­at­ing each other’s alignment

Moebius31429 May 2022 8:36 UTC
8 points
0 comments14 min readLW link

[Question] How would you build Dath Ilan on earth?

Yair Halberstadt29 May 2022 7:26 UTC
35 points
29 comments1 min readLW link

Distributed Decisions

johnswentworth29 May 2022 2:43 UTC
66 points
6 comments6 min readLW link

Distil­led—AGI Safety from First Principles

Harrison G29 May 2022 0:57 UTC
11 points
1 comment14 min readLW link