RSS

Newsletters

Tag

Re­cent up­dates to gw­ern.net (2011)

gwern26 Nov 2011 1:58 UTC
45 points
18 comments1 min readLW link

Re­cent up­dates to gw­ern.net (2013-2014)

gwern8 Jul 2014 1:44 UTC
38 points
32 comments4 min readLW link

An­nounc­ing LessWrong Digest

Evan_Gaensbauer23 Feb 2015 10:41 UTC
35 points
18 comments1 min readLW link

Re­cent up­dates to gw­ern.net (2014-2015)

gwern2 Nov 2015 0:06 UTC
34 points
3 comments3 min readLW link

Re­cent up­dates to gw­ern.net (2015-2016)

gwern26 Aug 2016 19:22 UTC
42 points
6 comments1 min readLW link

Bi-Weekly Ra­tional Feed

sapphire24 Jun 2017 0:07 UTC
35 points
3 comments12 min readLW link

Bi-weekly Ra­tional Feed

sapphire8 Aug 2017 13:56 UTC
29 points
4 comments13 min readLW link

Ra­tion­al­ity Feed: Last Month’s Best Posts

sapphire12 Feb 2018 13:18 UTC
23 points
1 comment3 min readLW link

Ra­tion­al­ity Feed: Last Month’s Best Posts

sapphire21 Mar 2018 14:12 UTC
20 points
2 comments2 min readLW link

An­nounc­ing Ra­tional Newsletter

Alexey Lapitsky1 Apr 2018 14:37 UTC
10 points
8 comments1 min readLW link

The Align­ment Newslet­ter #1: 04/​09/​18

Rohin Shah9 Apr 2018 16:00 UTC
12 points
3 comments4 min readLW link

The Align­ment Newslet­ter #2: 04/​16/​18

Rohin Shah16 Apr 2018 16:00 UTC
8 points
0 comments5 min readLW link

March gw­ern.net link roundup

gwern20 Apr 2018 19:09 UTC
10 points
1 comment1 min readLW link
(www.gwern.net)

The Align­ment Newslet­ter #3: 04/​23/​18

Rohin Shah23 Apr 2018 16:00 UTC
9 points
0 comments6 min readLW link

The Align­ment Newslet­ter #4: 04/​30/​18

Rohin Shah30 Apr 2018 16:00 UTC
8 points
0 comments3 min readLW link

Ra­tional Feed: Last Month’s Best Posts

sapphire2 May 2018 18:19 UTC
16 points
0 comments2 min readLW link

The Align­ment Newslet­ter #5: 05/​07/​18

Rohin Shah7 May 2018 16:00 UTC
8 points
0 comments7 min readLW link

The Align­ment Newslet­ter #6: 05/​14/​18

Rohin Shah14 May 2018 16:00 UTC
8 points
0 comments2 min readLW link

The Align­ment Newslet­ter #7: 05/​21/​18

Rohin Shah21 May 2018 16:00 UTC
8 points
0 comments5 min readLW link

The Align­ment Newslet­ter #8: 05/​28/​18

Rohin Shah28 May 2018 16:00 UTC
8 points
0 comments6 min readLW link

May gw­ern.net newsletter

gwern1 Jun 2018 14:47 UTC
24 points
3 comments1 min readLW link
(www.gwern.net)

The Align­ment Newslet­ter #9: 06/​04/​18

Rohin Shah4 Jun 2018 16:00 UTC
8 points
0 comments2 min readLW link

The Align­ment Newslet­ter #10: 06/​11/​18

Rohin Shah11 Jun 2018 16:00 UTC
16 points
0 comments9 min readLW link

The Align­ment Newslet­ter #11: 06/​18/​18

Rohin Shah18 Jun 2018 16:00 UTC
8 points
0 comments10 min readLW link

The Align­ment Newslet­ter #12: 06/​25/​18

Rohin Shah25 Jun 2018 16:00 UTC
15 points
0 comments3 min readLW link

Align­ment Newslet­ter #13: 07/​02/​18

Rohin Shah2 Jul 2018 16:10 UTC
70 points
12 comments8 min readLW link
(mailchi.mp)

June gw­ern.net newsletter

gwern4 Jul 2018 22:59 UTC
34 points
0 comments1 min readLW link
(www.gwern.net)

Align­ment Newslet­ter #14

Rohin Shah9 Jul 2018 16:20 UTC
14 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #15: 07/​16/​18

Rohin Shah16 Jul 2018 16:10 UTC
42 points
0 comments15 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #16: 07/​23/​18

Rohin Shah23 Jul 2018 16:20 UTC
42 points
0 comments12 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #17

Rohin Shah30 Jul 2018 16:10 UTC
32 points
0 comments13 min readLW link
(mailchi.mp)

July gw­ern.net newsletter

gwern2 Aug 2018 13:42 UTC
24 points
0 comments1 min readLW link
(www.gwern.net)

Align­ment Newslet­ter #18

Rohin Shah6 Aug 2018 16:00 UTC
17 points
0 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #19

Rohin Shah14 Aug 2018 2:10 UTC
18 points
0 comments13 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #20

Rohin Shah20 Aug 2018 16:00 UTC
12 points
2 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #21

Rohin Shah27 Aug 2018 16:20 UTC
25 points
0 comments7 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #22

Rohin Shah3 Sep 2018 16:10 UTC
18 points
0 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #23

Rohin Shah10 Sep 2018 17:10 UTC
16 points
0 comments7 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #24

Rohin Shah17 Sep 2018 16:20 UTC
10 points
6 comments12 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #25

Rohin Shah24 Sep 2018 16:10 UTC
18 points
3 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #26

Rohin Shah2 Oct 2018 16:10 UTC
13 points
0 comments7 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #27

Rohin Shah9 Oct 2018 1:10 UTC
16 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #28

Rohin Shah15 Oct 2018 21:20 UTC
11 points
0 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #29

Rohin Shah22 Oct 2018 16:20 UTC
15 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #30

Rohin Shah29 Oct 2018 16:10 UTC
29 points
2 comments6 min readLW link
(mailchi.mp)

Oc­to­ber gw­ern.net links

gwern1 Nov 2018 1:11 UTC
29 points
8 comments1 min readLW link
(www.gwern.net)

Align­ment Newslet­ter #31

Rohin Shah5 Nov 2018 23:50 UTC
17 points
0 comments12 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #32

Rohin Shah12 Nov 2018 17:20 UTC
18 points
0 comments12 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #33

Rohin Shah19 Nov 2018 17:20 UTC
23 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #34

Rohin Shah26 Nov 2018 23:10 UTC
24 points
0 comments10 min readLW link
(mailchi.mp)

Novem­ber 2018 gw­ern.net newsletter

gwern1 Dec 2018 13:57 UTC
35 points
0 comments1 min readLW link
(www.gwern.net)

Align­ment Newslet­ter #35

Rohin Shah4 Dec 2018 1:10 UTC
15 points
0 comments6 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #36

Rohin Shah12 Dec 2018 1:10 UTC
21 points
0 comments11 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #37

Rohin Shah17 Dec 2018 19:10 UTC
25 points
4 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #38

Rohin Shah25 Dec 2018 16:10 UTC
9 points
0 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #39

Rohin Shah1 Jan 2019 8:10 UTC
32 points
2 comments5 min readLW link
(mailchi.mp)

De­cem­ber gw­ern.net newsletter

gwern2 Jan 2019 15:13 UTC
20 points
0 comments1 min readLW link
(www.gwern.net)

Align­ment Newslet­ter #40

Rohin Shah8 Jan 2019 20:10 UTC
21 points
2 comments5 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #41

Rohin Shah17 Jan 2019 8:10 UTC
22 points
6 comments10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #42

Rohin Shah22 Jan 2019 2:00 UTC
20 points
1 comment10 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #43

Rohin Shah29 Jan 2019 21:10 UTC
14 points
2 comments13 min readLW link
(mailchi.mp)

Jan­uary 2019 gw­ern.net newsletter

gwern4 Feb 2019 15:53 UTC
15 points
0 comments1 min readLW link
(www.gwern.net)

Align­ment Newslet­ter #44

Rohin Shah6 Feb 2019 8:30 UTC
18 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #45

Rohin Shah14 Feb 2019 2:10 UTC
25 points
2 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #46

Rohin Shah22 Feb 2019 0:10 UTC
12 points
0 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #47

Rohin Shah4 Mar 2019 4:30 UTC
18 points
0 comments8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #48

Rohin Shah11 Mar 2019 21:10 UTC
29 points
14 comments9 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #49

Rohin Shah20 Mar 2019 4:20 UTC
23 points
1 comment11 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #50

Rohin Shah28 Mar 2019 18:10 UTC
15 points
2 comments10 min readLW link
(mailchi.mp)

March 2019 gw­ern.net newsletter

gwern2 Apr 2019 14:17 UTC
19 points
9 comments1 min readLW link
(www.gwern.net)

Align­ment Newslet­ter #51

Rohin Shah3 Apr 2019 4:10 UTC
25 points
2 comments15 min readLW link
(mailchi.mp)

Align­ment Newslet­ter #52

Rohin Shah6 Apr 2019 1:20 UTC
19 points
1 comment8 min readLW link
(mailchi.mp)

Align­ment Newslet­ter One Year Retrospective

Rohin Shah10 Apr 2019 6:58 UTC
94 points
31 comments21 min readLW link

Align­ment Newslet­ter #53

Rohin Shah18 Apr 2019 17:20 UTC
20 points
0 comments8 min readLW link
(mailchi.mp)

[AN #54] Box­ing a finite-hori­zon AI sys­tem to keep it unambitious

Rohin Shah28 Apr 2019 5:20 UTC
20 points
0 comments8 min readLW link
(mailchi.mp)

[AN #55] Reg­u­la­tory mar­kets and in­ter­na­tional stan­dards as a means of en­sur­ing benefi­cial AI

Rohin Shah5 May 2019 2:20 UTC
17 points
2 comments8 min readLW link
(mailchi.mp)

[AN #56] Should ML re­searchers stop run­ning ex­per­i­ments be­fore mak­ing hy­pothe­ses?

Rohin Shah21 May 2019 2:20 UTC
21 points
8 comments9 min readLW link
(mailchi.mp)

May gw­ern.net newsletter

gwern1 Jun 2019 17:25 UTC
17 points
0 comments1 min readLW link
(www.gwern.net)

[AN #57] Why we should fo­cus on ro­bust­ness in AI safety, and the analo­gous prob­lems in programming

Rohin Shah5 Jun 2019 23:20 UTC
26 points
15 comments7 min readLW link
(mailchi.mp)

[AN #58] Mesa op­ti­miza­tion: what it is, and why we should care

Rohin Shah24 Jun 2019 16:10 UTC
55 points
10 comments8 min readLW link
(mailchi.mp)

June 2019 gw­ern.net newsletter

gwern1 Jul 2019 14:35 UTC
29 points
0 comments1 min readLW link
(www.gwern.net)

[AN #59] How ar­gu­ments for AI risk have changed over time

Rohin Shah8 Jul 2019 17:20 UTC
43 points
4 comments7 min readLW link
(mailchi.mp)

[AN #60] A new AI challenge: Minecraft agents that as­sist hu­man play­ers in cre­ative mode

Rohin Shah22 Jul 2019 17:00 UTC
23 points
6 comments9 min readLW link
(mailchi.mp)

July 2019 gw­ern.net newsletter

gwern1 Aug 2019 16:19 UTC
23 points
0 comments1 min readLW link
(www.gwern.net)

[AN #61] AI policy and gov­er­nance, from two peo­ple in the field

Rohin Shah5 Aug 2019 17:00 UTC
12 points
2 comments9 min readLW link
(mailchi.mp)

Call for con­trib­u­tors to the Align­ment Newsletter

Rohin Shah21 Aug 2019 18:21 UTC
39 points
0 comments4 min readLW link

[AN #62] Are ad­ver­sar­ial ex­am­ples caused by real but im­per­cep­ti­ble fea­tures?

Rohin Shah22 Aug 2019 17:10 UTC
28 points
10 comments9 min readLW link
(mailchi.mp)

Rus­sian x-risks newslet­ter, sum­mer 2019

avturchin7 Sep 2019 9:50 UTC
39 points
5 comments4 min readLW link

[AN #63] How ar­chi­tec­ture search, meta learn­ing, and en­vi­ron­ment de­sign could lead to gen­eral intelligence

Rohin Shah10 Sep 2019 19:10 UTC
21 points
12 comments8 min readLW link
(mailchi.mp)

[AN #64]: Us­ing Deep RL and Re­ward Uncer­tainty to In­cen­tivize Prefer­ence Learning

Rohin Shah16 Sep 2019 17:10 UTC
11 points
8 comments7 min readLW link
(mailchi.mp)

[AN #65]: Learn­ing use­ful skills by watch­ing hu­mans “play”

Rohin Shah23 Sep 2019 17:30 UTC
11 points
0 comments9 min readLW link
(mailchi.mp)

[AN #66]: De­com­pos­ing ro­bust­ness into ca­pa­bil­ity ro­bust­ness and al­ign­ment robustness

Rohin Shah30 Sep 2019 18:00 UTC
12 points
1 comment7 min readLW link
(mailchi.mp)

Septem­ber 2019 gw­ern.net newsletter

gwern4 Oct 2019 16:44 UTC
21 points
0 comments1 min readLW link
(www.gwern.net)

[AN #67]: Creat­ing en­vi­ron­ments in which to study in­ner al­ign­ment failures

Rohin Shah7 Oct 2019 17:10 UTC
17 points
0 comments8 min readLW link
(mailchi.mp)

[AN #68]: The at­tain­able util­ity the­ory of impact

Rohin Shah14 Oct 2019 17:00 UTC
17 points
0 comments8 min readLW link
(mailchi.mp)

[AN #69] Stu­art Rus­sell’s new book on why we need to re­place the stan­dard model of AI

Rohin Shah19 Oct 2019 0:30 UTC
60 points
12 comments15 min readLW link
(mailchi.mp)

[AN #70]: Agents that help hu­mans who are still learn­ing about their own preferences

Rohin Shah23 Oct 2019 17:10 UTC
16 points
0 comments9 min readLW link
(mailchi.mp)

[AN #71]: Avoid­ing re­ward tam­per­ing through cur­rent-RF optimization

Rohin Shah30 Oct 2019 17:10 UTC
12 points
0 comments7 min readLW link
(mailchi.mp)

[AN #72]: Align­ment, ro­bust­ness, method­ol­ogy, and sys­tem build­ing as re­search pri­ori­ties for AI safety

Rohin Shah6 Nov 2019 18:10 UTC
26 points
4 comments10 min readLW link
(mailchi.mp)

[AN #73]: De­tect­ing catas­trophic failures by learn­ing how agents tend to break

Rohin Shah13 Nov 2019 18:10 UTC
11 points
0 comments7 min readLW link
(mailchi.mp)

[AN #74]: Separat­ing benefi­cial AI into com­pe­tence, al­ign­ment, and cop­ing with impacts

Rohin Shah20 Nov 2019 18:20 UTC
19 points
0 comments7 min readLW link
(mailchi.mp)

[AN #75]: Solv­ing Atari and Go with learned game mod­els, and thoughts from a MIRI employee

Rohin Shah27 Nov 2019 18:10 UTC
38 points
1 comment10 min readLW link
(mailchi.mp)

Rus­sian x-risks newslet­ter #2, fall 2019

avturchin3 Dec 2019 16:54 UTC
22 points
0 comments3 min readLW link

[AN #76]: How dataset size af­fects ro­bust­ness, and bench­mark­ing safe ex­plo­ra­tion by mea­sur­ing con­straint violations

Rohin Shah4 Dec 2019 18:10 UTC
14 points
6 comments9 min readLW link
(mailchi.mp)

[AN #77]: Dou­ble de­scent: a unifi­ca­tion of statis­ti­cal the­ory and mod­ern ML practice

Rohin Shah18 Dec 2019 18:30 UTC
21 points
4 comments14 min readLW link
(mailchi.mp)

[AN #78] For­mal­iz­ing power and in­stru­men­tal con­ver­gence, and the end-of-year AI safety char­ity comparison

Rohin Shah26 Dec 2019 1:10 UTC
26 points
10 comments9 min readLW link
(mailchi.mp)

[AN #79]: Re­cur­sive re­ward mod­el­ing as an al­ign­ment tech­nique in­te­grated with deep RL

Rohin Shah1 Jan 2020 18:00 UTC
13 points
0 comments12 min readLW link
(mailchi.mp)

[AN #80]: Why AI risk might be solved with­out ad­di­tional in­ter­ven­tion from longtermists

Rohin Shah2 Jan 2020 18:20 UTC
36 points
95 comments10 min readLW link
(mailchi.mp)

Dec 2019 gw­ern.net newsletter

gwern4 Jan 2020 20:48 UTC
17 points
2 comments1 min readLW link
(www.gwern.net)

[AN #81]: Univer­sal­ity as a po­ten­tial solu­tion to con­cep­tual difficul­ties in in­tent alignment

Rohin Shah8 Jan 2020 18:00 UTC
32 points
4 comments11 min readLW link
(mailchi.mp)

[AN #82]: How OpenAI Five dis­tributed their train­ing computation

Rohin Shah15 Jan 2020 18:20 UTC
19 points
0 comments8 min readLW link
(mailchi.mp)

[AN #83]: Sam­ple-effi­cient deep learn­ing with ReMixMatch

Rohin Shah22 Jan 2020 18:10 UTC
15 points
4 comments11 min readLW link
(mailchi.mp)

[AN #84] Re­view­ing AI al­ign­ment work in 2018-19

Rohin Shah29 Jan 2020 18:30 UTC
23 points
0 comments6 min readLW link
(mailchi.mp)

Jan­uary 2020 gw­ern.net newsletter

gwern31 Jan 2020 18:04 UTC
19 points
0 comments1 min readLW link
(www.gwern.net)

[AN #85]: The nor­ma­tive ques­tions we should be ask­ing for AI al­ign­ment, and a sur­pris­ingly good chatbot

Rohin Shah5 Feb 2020 18:20 UTC
14 points
2 comments7 min readLW link
(mailchi.mp)

[AN #86]: Im­prov­ing de­bate and fac­tored cog­ni­tion through hu­man experiments

Rohin Shah12 Feb 2020 18:10 UTC
15 points
0 comments9 min readLW link
(mailchi.mp)

[AN #87]: What might hap­pen as deep learn­ing scales even fur­ther?

Rohin Shah19 Feb 2020 18:20 UTC
28 points
0 comments4 min readLW link
(mailchi.mp)

[AN #88]: How the prin­ci­pal-agent liter­a­ture re­lates to AI risk

Rohin Shah27 Feb 2020 9:10 UTC
18 points
0 comments9 min readLW link
(mailchi.mp)

[AN #89]: A unify­ing for­mal­ism for prefer­ence learn­ing algorithms

Rohin Shah4 Mar 2020 18:20 UTC
16 points
0 comments9 min readLW link
(mailchi.mp)

Fe­bru­ary 2020 gw­ern.net newsletter

gwern4 Mar 2020 19:05 UTC
15 points
0 comments1 min readLW link
(www.gwern.net)

[AN #90]: How search land­scapes can con­tain self-re­in­forc­ing feed­back loops

Rohin Shah11 Mar 2020 17:30 UTC
11 points
6 comments8 min readLW link
(mailchi.mp)

[AN #91]: Con­cepts, im­ple­men­ta­tions, prob­lems, and a bench­mark for im­pact measurement

Rohin Shah18 Mar 2020 17:10 UTC
15 points
10 comments13 min readLW link
(mailchi.mp)

[AN #92]: Learn­ing good rep­re­sen­ta­tions with con­trastive pre­dic­tive coding

Rohin Shah25 Mar 2020 17:20 UTC
18 points
1 comment10 min readLW link
(mailchi.mp)

[AN #93]: The Precipice we’re stand­ing at, and how we can back away from it

Rohin Shah1 Apr 2020 17:10 UTC
24 points
0 comments7 min readLW link
(mailchi.mp)

March 2020 gw­ern.net newsletter

gwern3 Apr 2020 2:16 UTC
13 points
1 comment1 min readLW link
(www.gwern.net)

[AN #94]: AI al­ign­ment as trans­la­tion be­tween hu­mans and machines

Rohin Shah8 Apr 2020 17:10 UTC
11 points
0 comments7 min readLW link
(mailchi.mp)

[AN #95]: A frame­work for think­ing about how to make AI go well

Rohin Shah15 Apr 2020 17:10 UTC
20 points
2 comments10 min readLW link
(mailchi.mp)

[AN #96]: Buck and I dis­cuss/​ar­gue about AI Alignment

Rohin Shah22 Apr 2020 17:20 UTC
17 points
4 comments10 min readLW link
(mailchi.mp)

[AN #97]: Are there his­tor­i­cal ex­am­ples of large, ro­bust dis­con­ti­nu­ities?

Rohin Shah29 Apr 2020 17:30 UTC
15 points
0 comments10 min readLW link
(mailchi.mp)

Fore­cast­ing Newslet­ter: April 2020

NunoSempere30 Apr 2020 16:41 UTC
22 points
3 comments6 min readLW link

April 2020 gw­ern.net newsletter

gwern1 May 2020 20:47 UTC
11 points
0 comments1 min readLW link
(www.gwern.net)

[AN #98]: Un­der­stand­ing neu­ral net train­ing by see­ing which gra­di­ents were helpful

Rohin Shah6 May 2020 17:10 UTC
22 points
3 comments9 min readLW link
(mailchi.mp)

[AN #99]: Dou­bling times for the effi­ciency of AI algorithms

Rohin Shah13 May 2020 17:20 UTC
29 points
0 comments10 min readLW link
(mailchi.mp)

[AN #100]: What might go wrong if you learn a re­ward func­tion while acting

Rohin Shah20 May 2020 17:30 UTC
33 points
2 comments12 min readLW link
(mailchi.mp)

[AN #101]: Why we should rigor­ously mea­sure and fore­cast AI progress

Rohin Shah27 May 2020 17:20 UTC
15 points
0 comments10 min readLW link
(mailchi.mp)

Fore­cast­ing Newslet­ter: May 2020.

NunoSempere31 May 2020 12:35 UTC
9 points
1 comment20 min readLW link

May Gw­ern.net newslet­ter (w/​GPT-3 com­men­tary)

gwern2 Jun 2020 15:40 UTC
32 points
7 comments1 min readLW link
(www.gwern.net)

[AN #102]: Meta learn­ing by GPT-3, and a list of full pro­pos­als for AI alignment

Rohin Shah3 Jun 2020 17:20 UTC
38 points
6 comments10 min readLW link
(mailchi.mp)

[AN #103]: ARCHES: an agenda for ex­is­ten­tial safety, and com­bin­ing nat­u­ral lan­guage with deep RL

Rohin Shah10 Jun 2020 17:20 UTC
29 points
0 comments10 min readLW link
(mailchi.mp)

[AN #104]: The per­ils of in­ac­cessible in­for­ma­tion, and what we can learn about AI al­ign­ment from COVID

Rohin Shah18 Jun 2020 17:10 UTC
19 points
5 comments8 min readLW link
(mailchi.mp)

[AN #105]: The eco­nomic tra­jec­tory of hu­man­ity, and what we might mean by optimization

Rohin Shah24 Jun 2020 17:30 UTC
24 points
3 comments11 min readLW link
(mailchi.mp)

Fore­cast­ing Newslet­ter. June 2020.

NunoSempere1 Jul 2020 9:46 UTC
27 points
0 comments8 min readLW link

[AN #106]: Eval­u­at­ing gen­er­al­iza­tion abil­ity of learned re­ward models

Rohin Shah1 Jul 2020 17:20 UTC
14 points
2 comments11 min readLW link
(mailchi.mp)

June 2020 gw­ern.net newsletter

gwern2 Jul 2020 14:19 UTC
16 points
0 comments1 min readLW link
(www.gwern.net)

Null-box­ing New­comb’s Problem

Yitz13 Jul 2020 16:32 UTC
33 points
9 comments4 min readLW link

[AN #108]: Why we should scru­ti­nize ar­gu­ments for AI risk

Rohin Shah16 Jul 2020 6:47 UTC
19 points
6 comments12 min readLW link
(mailchi.mp)

[AN #107]: The con­ver­gent in­stru­men­tal sub­goals of goal-di­rected agents

Rohin Shah16 Jul 2020 6:47 UTC
13 points
1 comment8 min readLW link
(mailchi.mp)

[AN #109]: Teach­ing neu­ral nets to gen­er­al­ize the way hu­mans would

Rohin Shah22 Jul 2020 17:10 UTC
17 points
3 comments9 min readLW link
(mailchi.mp)

[AN #110]: Learn­ing fea­tures from hu­man feed­back to en­able re­ward learning

Rohin Shah29 Jul 2020 17:20 UTC
13 points
2 comments10 min readLW link
(mailchi.mp)

Fore­cast­ing Newslet­ter: July 2020.

NunoSempere1 Aug 2020 17:08 UTC
21 points
4 comments22 min readLW link

[AN #111]: The Cir­cuits hy­pothe­ses for deep learning

Rohin Shah5 Aug 2020 17:40 UTC
23 points
0 comments9 min readLW link
(mailchi.mp)

[AN #112]: Eng­ineer­ing a Safer World

Rohin Shah13 Aug 2020 17:20 UTC
25 points
2 comments12 min readLW link
(mailchi.mp)

[AN #113]: Check­ing the eth­i­cal in­tu­itions of large lan­guage models

Rohin Shah19 Aug 2020 17:10 UTC
23 points
0 comments9 min readLW link
(mailchi.mp)

July 2020 gw­ern.net newsletter

gwern20 Aug 2020 16:39 UTC
29 points
0 comments1 min readLW link
(www.gwern.net)

[AN #114]: The­ory-in­spired safety solu­tions for pow­er­ful Bayesian RL agents

Rohin Shah26 Aug 2020 17:20 UTC
21 points
3 comments8 min readLW link
(mailchi.mp)

Fore­cast­ing Newslet­ter: Au­gust 2020.

NunoSempere1 Sep 2020 11:38 UTC
16 points
1 comment6 min readLW link

Rus­sian x-risks newslet­ter Sum­mer 2020

avturchin1 Sep 2020 14:06 UTC
22 points
6 comments1 min readLW link

Au­gust 2020 gw­ern.net newsletter

gwern1 Sep 2020 21:04 UTC
25 points
4 comments1 min readLW link
(www.gwern.net)

[AN #115]: AI safety re­search prob­lems in the AI-GA framework

Rohin Shah2 Sep 2020 17:10 UTC
19 points
16 comments6 min readLW link
(mailchi.mp)

[AN #116]: How to make ex­pla­na­tions of neu­rons compositional

Rohin Shah9 Sep 2020 17:20 UTC
21 points
2 comments9 min readLW link
(mailchi.mp)

[AN #118]: Risks, solu­tions, and pri­ori­ti­za­tion in a world with many AI systems

Rohin Shah23 Sep 2020 18:20 UTC
15 points
6 comments10 min readLW link
(mailchi.mp)

Septem­ber 2020 gw­ern.net newsletter

gwern26 Oct 2020 13:38 UTC
17 points
1 comment1 min readLW link
(www.gwern.net)

[AN #123]: In­fer­ring what is valuable in or­der to al­ign recom­mender systems

Rohin Shah28 Oct 2020 17:00 UTC
20 points
1 comment8 min readLW link
(mailchi.mp)

Fore­cast­ing Newslet­ter: Oc­to­ber 2020.

NunoSempere1 Nov 2020 13:09 UTC
11 points
0 comments4 min readLW link

[AN #125]: Neu­ral net­work scal­ing laws across mul­ti­ple modalities

Rohin Shah11 Nov 2020 18:20 UTC
25 points
7 comments9 min readLW link
(mailchi.mp)

[AN #127]: Re­think­ing agency: Carte­sian frames as a for­mal­iza­tion of ways to carve up the world into an agent and its environment

Rohin Shah2 Dec 2020 18:20 UTC
53 points
0 comments13 min readLW link
(mailchi.mp)

Novem­ber 2020 gw­ern.net newsletter

gwern3 Dec 2020 22:47 UTC
14 points
5 comments1 min readLW link
(www.gwern.net)

[AN #129]: Ex­plain­ing dou­ble de­scent by mea­sur­ing bias and variance

Rohin Shah16 Dec 2020 18:10 UTC
14 points
1 comment7 min readLW link
(mailchi.mp)

[AN #133]: Build­ing ma­chines that can co­op­er­ate (with hu­mans, in­sti­tu­tions, or other ma­chines)

Rohin Shah13 Jan 2021 18:10 UTC
14 points
0 comments9 min readLW link
(mailchi.mp)

[AN #136]: How well will GPT-N perform on down­stream tasks?

Rohin Shah3 Feb 2021 18:10 UTC
21 points
2 comments9 min readLW link
(mailchi.mp)

[AN #145]: Our three year an­niver­sary!

Rohin Shah9 Apr 2021 17:48 UTC
19 points
0 comments8 min readLW link
(mailchi.mp)

Fore­cast­ing Newslet­ter: April 2021

NunoSempere1 May 2021 16:07 UTC
9 points
0 comments10 min readLW link

[AN #166]: Is it crazy to claim we’re in the most im­por­tant cen­tury?

Rohin Shah8 Oct 2021 17:30 UTC
52 points
5 comments8 min readLW link
(mailchi.mp)

[AN #167]: Con­crete ML safety prob­lems and their rele­vance to x-risk

Rohin Shah20 Oct 2021 17:10 UTC
19 points
4 comments9 min readLW link
(mailchi.mp)

[AN #170]: An­a­lyz­ing the ar­gu­ment for risk from power-seek­ing AI

Rohin Shah8 Dec 2021 18:10 UTC
21 points
1 comment7 min readLW link
(mailchi.mp)

Fore­cast­ing Newslet­ter: Jan­uary 2022

NunoSempere3 Feb 2022 19:22 UTC
17 points
0 comments6 min readLW link

Fore­cast­ing Newslet­ter: Fe­bru­ary 2022

NunoSempere5 Mar 2022 19:30 UTC
36 points
0 comments9 min readLW link

[AN #172] Sorry for the long hi­a­tus!

Rohin Shah5 Jul 2022 6:20 UTC
54 points
0 comments3 min readLW link
(mailchi.mp)

[AN #173] Re­cent lan­guage model re­sults from DeepMind

Rohin Shah21 Jul 2022 2:30 UTC
37 points
9 comments8 min readLW link
(mailchi.mp)

EA & LW Fo­rums Weekly Sum­mary (21 Aug − 27 Aug 22′)

Zoe Williams30 Aug 2022 1:42 UTC
57 points
4 comments12 min readLW link

EA & LW Fo­rums Weekly Sum­mary (28 Aug − 3 Sep 22’)

Zoe Williams6 Sep 2022 11:06 UTC
51 points
2 comments14 min readLW link

Quintin’s al­ign­ment pa­pers roundup—week 1

Quintin Pope10 Sep 2022 6:39 UTC
120 points
6 comments9 min readLW link

EA & LW Fo­rums Weekly Sum­mary (5 − 11 Sep 22′)

Zoe Williams12 Sep 2022 23:24 UTC
24 points
0 comments13 min readLW link

Quintin’s al­ign­ment pa­pers roundup—week 2

Quintin Pope19 Sep 2022 13:41 UTC
67 points
2 comments10 min readLW link

QAPR 3: in­ter­pretabil­ity-guided train­ing of neu­ral nets

Quintin Pope28 Sep 2022 16:02 UTC
58 points
2 comments10 min readLW link

QAPR 4: In­duc­tive biases

Quintin Pope10 Oct 2022 22:08 UTC
67 points
2 comments18 min readLW link

EA & LW Fo­rums Weekly Sum­mary (26 Sep − 9 Oct 22′)

Zoe Williams10 Oct 2022 23:58 UTC
13 points
2 comments1 min readLW link

[MLSN #6]: Trans­parency sur­vey, prov­able ro­bust­ness, ML mod­els that pre­dict the future

Dan H12 Oct 2022 20:56 UTC
27 points
0 comments6 min readLW link

EA & LW Fo­rums Weekly Sum­mary (10 − 16 Oct 22′)

Zoe Williams17 Oct 2022 22:51 UTC
12 points
4 comments1 min readLW link

Newslet­ter for Align­ment Re­search: The ML Safety Updates

Esben Kran22 Oct 2022 16:17 UTC
25 points
0 comments1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (17 − 23 Oct 22′)

Zoe Williams25 Oct 2022 2:57 UTC
10 points
0 comments1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (24 − 30th Oct 22′)

Zoe Williams1 Nov 2022 2:58 UTC
13 points
1 comment1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (31st Oct − 6th Nov 22′)

Zoe Williams8 Nov 2022 3:58 UTC
12 points
1 comment1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (7th Nov − 13th Nov 22′)

Zoe Williams16 Nov 2022 3:04 UTC
19 points
0 comments1 min readLW link

[Question] What AI newslet­ters or sub­stacks about AI do you recom­mend?

wunan25 Nov 2022 19:29 UTC
6 points
1 comment1 min readLW link

EA & LW Fo­rums Weekly Sum­mary (14th Nov − 27th Nov 22′)

Zoe Williams29 Nov 2022 23:00 UTC
21 points
1 comment1 min readLW link

NeurIPS Safety & ChatGPT. MLAISU W48

2 Dec 2022 15:50 UTC
3 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

ML Safety at NeurIPS & Paradig­matic AI Safety? MLAISU W49

9 Dec 2022 10:38 UTC
19 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

EA & LW Fo­rums Weekly Sum­mary (5th Dec − 11th Dec 22′)

Zoe Williams13 Dec 2022 2:53 UTC
7 points
0 comments1 min readLW link

Will Machines Ever Rule the World? MLAISU W50

Esben Kran16 Dec 2022 11:03 UTC
12 points
7 comments4 min readLW link
(newsletter.apartresearch.com)

EA & LW Fo­rums Weekly Sum­mary (12th Dec − 18th Dec 22′)

Zoe Williams20 Dec 2022 9:49 UTC
10 points
0 comments1 min readLW link

AI im­prov­ing AI [MLAISU W01!]

Esben Kran6 Jan 2023 11:13 UTC
5 points
0 comments4 min readLW link
(newsletter.apartresearch.com)

[MLSN #7]: an ex­am­ple of an emer­gent in­ter­nal optimizer

9 Jan 2023 19:39 UTC
28 points
0 comments6 min readLW link

Ro­bust­ness & Evolu­tion [MLAISU W02]

Esben Kran13 Jan 2023 15:47 UTC
10 points
0 comments3 min readLW link
(newsletter.apartresearch.com)

EA & LW Fo­rum Sum­maries (9th Jan to 15th Jan 23′)

Zoe Williams18 Jan 2023 7:29 UTC
17 points
0 comments1 min readLW link

Gen­er­al­iz­abil­ity & Hope for AI [MLAISU W03]

Esben Kran20 Jan 2023 10:06 UTC
5 points
2 comments2 min readLW link
(newsletter.apartresearch.com)

EA & LW Fo­rum Weekly Sum­mary (16th − 22nd Jan ’23)

Zoe Williams23 Jan 2023 3:46 UTC
13 points
0 comments1 min readLW link

EA & LW Fo­rum Weekly Sum­mary (23rd − 29th Jan ’23)

Zoe Williams31 Jan 2023 0:36 UTC
12 points
0 comments1 min readLW link

EA & LW Fo­rum Weekly Sum­mary (30th Jan − 5th Feb 2023)

Zoe Williams7 Feb 2023 2:13 UTC
3 points
3 comments1 min readLW link

[MLSN #8] Mechanis­tic in­ter­pretabil­ity, us­ing law to in­form AI al­ign­ment, scal­ing laws for proxy gaming

20 Feb 2023 15:54 UTC
20 points
0 comments4 min readLW link
(newsletter.mlsafety.org)

EA & LW Fo­rum Weekly Sum­mary (27th Feb − 5th Mar 2023)

Zoe Williams6 Mar 2023 3:18 UTC
12 points
0 comments1 min readLW link

EA & LW Fo­rum Weekly Sum­mary (6th − 12th March 2023)

Zoe Williams14 Mar 2023 3:01 UTC
7 points
0 comments1 min readLW link

AI Safety − 7 months of dis­cus­sion in 17 minutes

Zoe Williams15 Mar 2023 23:41 UTC
25 points
0 comments1 min readLW link

EA & LW Fo­rum Weekly Sum­mary (13th − 19th March 2023)

Zoe Williams20 Mar 2023 4:18 UTC
13 points
0 comments1 min readLW link

EA & LW Fo­rum Weekly Sum­mary (20th − 26th March 2023)

Zoe Williams27 Mar 2023 20:46 UTC
4 points
0 comments1 min readLW link

AI #6: Agents of Change

Zvi6 Apr 2023 14:00 UTC
79 points
13 comments47 min readLW link
(thezvi.wordpress.com)

AI Safety Newslet­ter #1 [CAIS Linkpost]

10 Apr 2023 20:18 UTC
45 points
0 comments4 min readLW link
(newsletter.safe.ai)

[MLSN #9] Ver­ify­ing large train­ing runs, se­cu­rity risks from LLM ac­cess to APIs, why nat­u­ral se­lec­tion may fa­vor AIs over humans

11 Apr 2023 16:03 UTC
11 points
0 comments6 min readLW link
(newsletter.mlsafety.org)

AI #7: Free Agency

Zvi13 Apr 2023 16:20 UTC
33 points
12 comments47 min readLW link
(thezvi.wordpress.com)

Nav­i­gat­ing AI Risks (NAIR) #1: Slow­ing Down AI

simeon_c14 Apr 2023 14:35 UTC
11 points
3 comments1 min readLW link
(navigatingairisks.substack.com)

AI Im­pacts Quar­terly Newslet­ter, Jan-Mar 2023

Harlan17 Apr 2023 22:10 UTC
5 points
0 comments3 min readLW link
(blog.aiimpacts.org)

AI Safety Newslet­ter #2: ChaosGPT, Nat­u­ral Selec­tion, and AI Safety in the Media

18 Apr 2023 18:44 UTC
30 points
0 comments4 min readLW link
(newsletter.safe.ai)

Sum­maries of top fo­rum posts (17th − 23rd April 2023)

Zoe Williams24 Apr 2023 4:13 UTC
18 points
0 comments1 min readLW link

AI Safety Newslet­ter #3: AI policy pro­pos­als and a new challenger approaches

ozhang25 Apr 2023 16:15 UTC
33 points
0 comments1 min readLW link

AI #9: The Merge and the Million Tokens

Zvi27 Apr 2023 14:20 UTC
36 points
8 comments53 min readLW link
(thezvi.wordpress.com)

Sum­maries of top fo­rum posts (24th − 30th April 2023)

Zoe Williams2 May 2023 2:30 UTC
12 points
1 comment1 min readLW link

AI Safety Newslet­ter #4: AI and Cy­ber­se­cu­rity, Per­sua­sive AIs, Weaponiza­tion, and Ge­offrey Hin­ton talks AI risks

2 May 2023 18:41 UTC
32 points
0 comments5 min readLW link
(newsletter.safe.ai)

AI #10: Code In­ter­preter and Ge­off Hinton

Zvi4 May 2023 14:00 UTC
80 points
7 comments78 min readLW link
(thezvi.wordpress.com)

Reg­u­late or Com­pete? The China Fac­tor in U.S. AI Policy (NAIR #2)

charles_m5 May 2023 17:43 UTC
2 points
1 comment7 min readLW link
(navigatingairisks.substack.com)

Sum­maries of top fo­rum posts (1st to 7th May 2023)

Zoe Williams9 May 2023 9:30 UTC
21 points
0 comments1 min readLW link

AI Safety Newslet­ter #5: Ge­offrey Hin­ton speaks out on AI risk, the White House meets with AI labs, and Tro­jan at­tacks on lan­guage models

9 May 2023 15:26 UTC
28 points
1 comment4 min readLW link
(newsletter.safe.ai)

AI #11: In Search of a Moat

Zvi11 May 2023 15:40 UTC
67 points
28 comments81 min readLW link
(thezvi.wordpress.com)

AI Safety Newslet­ter #6: Ex­am­ples of AI safety progress, Yoshua Ben­gio pro­poses a ban on AI agents, and les­sons from nu­clear arms control

16 May 2023 15:14 UTC
31 points
0 comments6 min readLW link
(newsletter.safe.ai)

Progress links and tweets, 2023-05-16

jasoncrawford16 May 2023 20:54 UTC
14 points
0 comments1 min readLW link
(rootsofprogress.org)

Hi­a­tus: EA and LW post summaries

Zoe Williams17 May 2023 17:17 UTC
14 points
0 comments1 min readLW link

AI Safety Newslet­ter #7: Dis­in­for­ma­tion, Gover­nance Recom­men­da­tions for AI labs, and Se­nate Hear­ings on AI

23 May 2023 21:47 UTC
25 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI #13: Po­ten­tial Al­gorith­mic Improvements

Zvi25 May 2023 15:40 UTC
45 points
4 comments67 min readLW link
(thezvi.wordpress.com)

AI Safety Newslet­ter #8: Rogue AIs, how to screen for AI risks, and grants for re­search on demo­cratic gov­er­nance of AI

30 May 2023 11:52 UTC
20 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI #14: A Very Good Sentence

Zvi1 Jun 2023 21:30 UTC
118 points
30 comments65 min readLW link
(thezvi.wordpress.com)

AISN #9: State­ment on Ex­tinc­tion Risks, Com­pet­i­tive Pres­sures, and When Will AI Reach Hu­man-Level?

6 Jun 2023 16:10 UTC
12 points
0 comments7 min readLW link
(newsletter.safe.ai)

AI #15: The Prin­ci­ple of Charity

Zvi8 Jun 2023 12:10 UTC
73 points
16 comments44 min readLW link
(thezvi.wordpress.com)

AI #16: AI in the UK

Zvi15 Jun 2023 13:20 UTC
46 points
20 comments54 min readLW link
(thezvi.wordpress.com)

AI #17: The Litany

Zvi22 Jun 2023 14:30 UTC
95 points
31 comments56 min readLW link
(thezvi.wordpress.com)

AISN #12: Policy Pro­pos­als from NTIA’s Re­quest for Com­ment and Re­con­sid­er­ing In­stru­men­tal Convergence

Dan H27 Jun 2023 17:20 UTC
6 points
0 comments1 min readLW link

AI #18: The Great De­bate Debate

Zvi29 Jun 2023 16:20 UTC
47 points
9 comments52 min readLW link
(thezvi.wordpress.com)

Monthly Roundup #8: July 2023

Zvi3 Jul 2023 13:20 UTC
40 points
4 comments46 min readLW link
(thezvi.wordpress.com)

AISN #13: An in­ter­dis­ci­plinary per­spec­tive on AI proxy failures, new com­peti­tors to ChatGPT, and prompt­ing lan­guage mod­els to misbehave

Dan H5 Jul 2023 15:33 UTC
13 points
0 comments1 min readLW link

AI #19: Hofs­tadter, Sutskever, Leike

Zvi6 Jul 2023 12:50 UTC
60 points
16 comments40 min readLW link
(thezvi.wordpress.com)

Progress Stud­ies Fel­low­ship look­ing for members

jay ram6 Jul 2023 17:41 UTC
3 points
0 comments1 min readLW link

AISN#14: OpenAI’s ‘Su­per­al­ign­ment’ team, Musk’s xAI launches, and de­vel­op­ments in mil­i­tary AI use

Dan H12 Jul 2023 16:58 UTC
16 points
0 comments1 min readLW link

AI #20: Code In­ter­preter and Claude 2.0 for Everyone

Zvi13 Jul 2023 14:00 UTC
60 points
9 comments56 min readLW link
(thezvi.wordpress.com)

AI Im­pacts Quar­terly Newslet­ter, Apr-Jun 2023

18 Jul 2023 17:14 UTC
6 points
0 comments3 min readLW link
(blog.aiimpacts.org)

AISN#15: China and the US take ac­tion to reg­u­late AI, re­sults from a tour­na­ment fore­cast­ing AI risk, up­dates on xAI’s plan, and Meta re­leases its open-source and com­mer­cially available Llama 2

19 Jul 2023 13:01 UTC
16 points
0 comments6 min readLW link
(newsletter.safe.ai)

Progress links and tweets, 2023-07-20: “A god­dess en­throned on a car”

jasoncrawford20 Jul 2023 18:28 UTC
12 points
4 comments2 min readLW link
(rootsofprogress.org)

AISN #16: White House Se­cures Vol­un­tary Com­mit­ments from Lead­ing AI Labs and Les­sons from Oppenheimer

25 Jul 2023 16:58 UTC
6 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI #22: Into the Weeds

Zvi27 Jul 2023 17:40 UTC
49 points
8 comments84 min readLW link
(thezvi.wordpress.com)

AISN #16: White House Se­cures Vol­un­tary Com­mit­ments from Lead­ing AI Labs and Les­sons from Oppenheimer

1 Aug 2023 15:39 UTC
1 point
0 comments6 min readLW link
(newsletter.safe.ai)

AISN #17: Au­to­mat­i­cally Cir­cum­vent­ing LLM Guardrails, the Fron­tier Model Fo­rum, and Se­nate Hear­ing on AI Oversight

1 Aug 2023 15:40 UTC
8 points
0 comments8 min readLW link
(newsletter.safe.ai)

AI #23: Fun­da­men­tal Prob­lems with RLHF

Zvi3 Aug 2023 12:50 UTC
59 points
9 comments41 min readLW link
(thezvi.wordpress.com)

Man­i­fund: What we’re fund­ing (weeks 2-4)

Austin Chen4 Aug 2023 16:00 UTC
44 points
2 comments1 min readLW link
(manifund.substack.com)

AISN #18: Challenges of Re­in­force­ment Learn­ing from Hu­man Feed­back, Microsoft’s Se­cu­rity Breach, and Con­cep­tual Re­search on AI Safety

aogara8 Aug 2023 15:52 UTC
13 points
0 comments1 min readLW link
(newsletter.safe.ai)

Progress links di­gest, 2023-08-09: US adds new nu­clear, Katalin Kar­ikó in­ter­view, and more

jasoncrawford9 Aug 2023 19:22 UTC
18 points
0 comments3 min readLW link
(rootsofprogress.org)

AI #24: Week of the Podcast

Zvi10 Aug 2023 15:00 UTC
49 points
5 comments44 min readLW link
(thezvi.wordpress.com)

AISN #19: US-China Com­pe­ti­tion on AI Chips, Mea­sur­ing Lan­guage Agent Devel­op­ments, Eco­nomic Anal­y­sis of Lan­guage Model Pro­pa­ganda, and White House AI Cy­ber Challenge

15 Aug 2023 16:10 UTC
21 points
0 comments5 min readLW link
(newsletter.safe.ai)

AI #26: Fine Tun­ing Time

Zvi24 Aug 2023 15:30 UTC
49 points
6 comments33 min readLW link
(thezvi.wordpress.com)

AISN #20: LLM Pro­lifer­a­tion, AI De­cep­tion, and Con­tin­u­ing Drivers of AI Capabilities

29 Aug 2023 15:07 UTC
12 points
0 comments8 min readLW link
(newsletter.safe.ai)

AI #27: Por­tents of Gemini

Zvi31 Aug 2023 12:40 UTC
54 points
37 comments47 min readLW link
(thezvi.wordpress.com)

AISN #21: Google Deep­Mind’s GPT-4 Com­peti­tor, Mili­tary In­vest­ments in Au­tonomous Drones, The UK AI Safety Sum­mit, and Case Stud­ies in AI Policy

5 Sep 2023 15:03 UTC
15 points
0 comments5 min readLW link
(newsletter.safe.ai)

MLSN: #10 Ad­ver­sar­ial At­tacks Against Lan­guage and Vi­sion Models, Im­prov­ing LLM Hon­esty, and Trac­ing the In­fluence of LLM Train­ing Data

13 Sep 2023 18:03 UTC
15 points
1 comment5 min readLW link
(newsletter.mlsafety.org)

AISN #22: The Land­scape of US AI Leg­is­la­tion - Hear­ings, Frame­works, Bills, and Laws

19 Sep 2023 14:44 UTC
20 points
0 comments5 min readLW link
(newsletter.safe.ai)

AISN #23: New OpenAI Models, News from An­thropic, and Rep­re­sen­ta­tion Engineering

4 Oct 2023 17:37 UTC
15 points
2 comments5 min readLW link
(newsletter.safe.ai)

AISN #24: Kiss­inger Urges US-China Co­op­er­a­tion on AI, China’s New AI Law, US Ex­port Con­trols, In­ter­na­tional In­sti­tu­tions, and Open Source AI

18 Oct 2023 17:06 UTC
14 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI Align­ment [In­cre­men­tal Progress Units] this Week (10/​22/​23)

Logan Zoellner23 Oct 2023 20:32 UTC
22 points
0 comments6 min readLW link
(midwitalignment.substack.com)

AISN #25: White House Ex­ec­u­tive Order on AI, UK AI Safety Sum­mit, and Progress on Vol­un­tary Eval­u­a­tions of AI Risks

31 Oct 2023 19:34 UTC
35 points
1 comment6 min readLW link
(newsletter.safe.ai)

What I’ve been read­ing, Novem­ber 2023

jasoncrawford7 Nov 2023 13:37 UTC
23 points
1 comment5 min readLW link
(rootsofprogress.org)

AISN #26: Na­tional In­sti­tu­tions for AI Safety, Re­sults From the UK Sum­mit, and New Re­leases From OpenAI and xAI

15 Nov 2023 16:07 UTC
12 points
0 comments6 min readLW link
(newsletter.safe.ai)

OpenAI: Facts from a Weekend

Zvi20 Nov 2023 15:30 UTC
264 points
158 comments9 min readLW link
(thezvi.wordpress.com)

AI #41: Bring in the Other Gemini

Zvi7 Dec 2023 15:10 UTC
46 points
16 comments52 min readLW link
(thezvi.wordpress.com)

AISN #27: Defen­sive Ac­cel­er­a­tionism, A Ret­ro­spec­tive On The OpenAI Board Saga, And A New AI Bill From Se­na­tors Thune And Klobuchar

7 Dec 2023 15:59 UTC
9 points
0 comments6 min readLW link
(newsletter.safe.ai)

AISN #28: Cen­ter for AI Safety 2023 Year in Review

23 Dec 2023 21:31 UTC
30 points
1 comment5 min readLW link
(newsletter.safe.ai)

AISN #29: Progress on the EU AI Act Plus, the NY Times sues OpenAI for Copy­right In­fringe­ment, and Con­gres­sional Ques­tions about Re­search Stan­dards in AI Safety

4 Jan 2024 16:09 UTC
6 points
0 comments6 min readLW link
(newsletter.safe.ai)

AISN #30: In­vest­ments in Com­pute and Mili­tary AI Plus, Ja­pan and Sin­ga­pore’s Na­tional AI Safety Institutes

24 Jan 2024 19:38 UTC
23 points
1 comment6 min readLW link
(newsletter.safe.ai)

AISN #31: A New AI Policy Bill in Cal­ifor­nia Plus, Prece­dents for AI Gover­nance and The EU AI Office

21 Feb 2024 21:58 UTC
9 points
0 comments6 min readLW link
(newsletter.safe.ai)

AI #53: One More Leap

Zvi29 Feb 2024 16:10 UTC
45 points
0 comments38 min readLW link
(thezvi.wordpress.com)

AISN #32: Mea­sur­ing and Re­duc­ing Hazardous Knowl­edge in LLMs Plus, Fore­cast­ing the Fu­ture with LLMs, and Reg­u­la­tory Markets

7 Mar 2024 16:39 UTC
6 points
0 comments8 min readLW link
(newsletter.safe.ai)
No comments.