Newsletters

Tag

Forecasting Newsletter. June 2020.

NunoSempere1 Jul 2020 9:46 UTC

27 points

0 comments8 min readLW link

QAPR 4: Inductive biases

Quintin Pope10 Oct 2022 22:08 UTC

67 points

2 comments18 min readLW link

[MLSN #8] Mechanistic interpretability, using law to inform AI alignment, scaling laws for proxy gaming

Dan H and TW123

20 Feb 2023 15:54 UTC

20 points

0 comments4 min readLW link

(newsletter.mlsafety.org)

Quintin’s alignment papers roundup—week 1

Quintin Pope10 Sep 2022 6:39 UTC

122 points

6 comments9 min readLW link

[AN #115]: AI safety research problems in the AI-GA framework

Rohin Shah2 Sep 2020 17:10 UTC

19 points

16 comments6 min readLW link

(mailchi.mp)

[AN #102]: Meta learning by GPT-3, and a list of full proposals for AI alignment

Rohin Shah3 Jun 2020 17:20 UTC

38 points

6 comments10 min readLW link

(mailchi.mp)

Fun With Veo 3 and Media Generation

Zvi28 May 2025 18:30 UTC

29 points

0 comments5 min readLW link

(thezvi.wordpress.com)

AI Impacts Quarterly Newsletter, Jan-Mar 2023

Harlan17 Apr 2023 22:10 UTC

5 points

0 comments3 min readLW link

(blog.aiimpacts.org)

GPT-5s Are Alive: Outside Reactions, the Router and the Resurrection of GPT-4o

Zvi12 Aug 2025 12:40 UTC

35 points

9 comments29 min readLW link

(thezvi.wordpress.com)

RTFB: The RAISE Act

Zvi16 Jun 2025 12:50 UTC

97 points

8 comments8 min readLW link

(thezvi.wordpress.com)

AI #7: Free Agency

Zvi13 Apr 2023 16:20 UTC

33 points

12 comments47 min readLW link

(thezvi.wordpress.com)

AISN #33: Reassessing AI and Biorisk Plus, Consolidation in the Corporate AI Landscape, and National Investments in AI

Corin Katzke, Alexa Pan and Dan H

12 Apr 2024 16:10 UTC

13 points

0 comments9 min readLW link

(newsletter.safe.ai)

Summaries of top forum posts (17th − 23rd April 2023)

Zoe Williams24 Apr 2023 4:13 UTC

18 points

0 comments8 min readLW link

Llama Llama-3-405B?

Zvi24 Jul 2024 19:40 UTC

51 points

9 comments30 min readLW link

(thezvi.wordpress.com)

We’re in Deep Research

Zvi4 Feb 2025 17:20 UTC

45 points

3 comments20 min readLW link

(thezvi.wordpress.com)

Monthly Roundup #8: July 2023

Zvi3 Jul 2023 13:20 UTC

40 points

4 comments46 min readLW link

(thezvi.wordpress.com)

AI #17: The Litany

Zvi22 Jun 2023 14:30 UTC

95 points

34 comments56 min readLW link

(thezvi.wordpress.com)

The Paris AI Anti-Safety Summit

Zvi12 Feb 2025 14:00 UTC

129 points

21 comments21 min readLW link

(thezvi.wordpress.com)

EA & LW Forums Weekly Summary (17 − 23 Oct 22′)

Zoe Williams25 Oct 2022 2:57 UTC

10 points

0 comments13 min readLW link

AI #16: AI in the UK

Zvi15 Jun 2023 13:20 UTC

46 points

20 comments54 min readLW link

(thezvi.wordpress.com)

Alignment Newsletter #36

Rohin Shah12 Dec 2018 1:10 UTC

21 points

0 comments11 min readLW link

(mailchi.mp)

DeepSeek Panic at the App Store

Zvi28 Jan 2025 19:30 UTC

51 points

14 comments33 min readLW link

(thezvi.wordpress.com)

Regarding South Africa

Zvi16 May 2025 16:10 UTC

71 points

5 comments11 min readLW link

(thezvi.wordpress.com)

AI Safety Newsletter #3: AI policy proposals and a new challenger approaches

ozhang25 Apr 2023 16:15 UTC

33 points

0 comments4 min readLW link

(newsletter.safe.ai)

AI #87: Staying in Character

Zvi29 Oct 2024 7:10 UTC

57 points

3 comments33 min readLW link

(thezvi.wordpress.com)

AI #121 Part 1: New Connections

Zvi19 Jun 2025 13:00 UTC

32 points

12 comments39 min readLW link

(thezvi.wordpress.com)

AISN #35: Lobbying on AI Regulation Plus, New Models from OpenAI and Google, and Legal Regimes for Training on Copyrighted Data

Dan H and Corin Katzke

16 May 2024 14:29 UTC

2 points

3 comments6 min readLW link

(newsletter.safe.ai)

AI #15: The Principle of Charity

Zvi8 Jun 2023 12:10 UTC

73 points

16 comments44 min readLW link

(thezvi.wordpress.com)

EA & LW Forums Weekly Summary (5th Dec − 11th Dec 22′)

Zoe Williams13 Dec 2022 2:53 UTC

7 points

0 comments18 min readLW link

AI #89: Trump Card

Zvi7 Nov 2024 16:30 UTC

42 points

12 comments42 min readLW link

(thezvi.wordpress.com)

AISN #12: Policy Proposals from NTIA’s Request for Comment and Reconsidering Instrumental Convergence

Dan H27 Jun 2023 17:20 UTC

6 points

0 comments7 min readLW link

(newsletter.safe.ai)

Claude 4 You: The Quest for Mundane Utility

Zvi26 May 2025 13:01 UTC

36 points

0 comments17 min readLW link

(thezvi.wordpress.com)

EA & LW Forums Weekly Summary (21 Aug − 27 Aug 22′)

Zoe Williams30 Aug 2022 1:42 UTC

57 points

4 comments12 min readLW link

AI #23: Fundamental Problems with RLHF

Zvi3 Aug 2023 12:50 UTC

59 points

9 comments41 min readLW link

(thezvi.wordpress.com)

AI #20: Code Interpreter and Claude 2.0 for Everyone

Zvi13 Jul 2023 14:00 UTC

60 points

9 comments56 min readLW link

(thezvi.wordpress.com)

AISN #28: Center for AI Safety 2023 Year in Review

Dan H23 Dec 2023 21:31 UTC

30 points

1 comment5 min readLW link

(newsletter.safe.ai)

AISN #9: Statement on Extinction Risks, Competitive Pressures, and When Will AI Reach Human-Level?

Dan H6 Jun 2023 16:10 UTC

12 points

0 comments7 min readLW link

(newsletter.safe.ai)

QAPR 3: interpretability-guided training of neural nets

Quintin Pope28 Sep 2022 16:02 UTC

58 points

2 comments10 min readLW link

AI #19: Hofstadter, Sutskever, Leike

Zvi6 Jul 2023 12:50 UTC

60 points

16 comments40 min readLW link

(thezvi.wordpress.com)

AI #26: Fine Tuning Time

Zvi24 Aug 2023 15:30 UTC

49 points

6 comments33 min readLW link

(thezvi.wordpress.com)

AI #76: Six Shorts Stories About OpenAI

Zvi8 Aug 2024 13:50 UTC

53 points

10 comments48 min readLW link

(thezvi.wordpress.com)

Meta Pivots on Content Moderation

Zvi17 Jan 2025 14:20 UTC

47 points

3 comments10 min readLW link

(thezvi.wordpress.com)

AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer

Corin Katzke and Dan H

25 Jul 2023 16:58 UTC

6 points

0 comments6 min readLW link

(newsletter.safe.ai)

AI Safety Newsletter #6: Examples of AI safety progress, Yoshua Bengio proposes a ban on AI agents, and lessons from nuclear arms control

Dan H and Orpheus16

16 May 2023 15:14 UTC

31 points

0 comments6 min readLW link

(newsletter.safe.ai)

EA & LW Forum Weekly Summary (20th − 26th March 2023)

Zoe Williams27 Mar 2023 20:46 UTC

4 points

0 comments6 min readLW link

Sentinel minutes #10/2025: Trump tariffs, US/China tensions, Claude code reward hacking.

NunoSempere10 Mar 2025 19:00 UTC

25 points

0 comments10 min readLW link

(blog.sentinel-team.org)

[AN #166]: Is it crazy to claim we’re in the most important century?

Rohin Shah8 Oct 2021 17:30 UTC

52 points

5 comments8 min readLW link

(mailchi.mp)

AI #95: o1 Joins the API

Zvi19 Dec 2024 15:10 UTC

58 points

1 comment41 min readLW link

(thezvi.wordpress.com)

AI #53: One More Leap

Zvi29 Feb 2024 16:10 UTC

45 points

0 comments38 min readLW link

(thezvi.wordpress.com)

Childhood and Education #13: College

Zvi5 Aug 2025 15:00 UTC

39 points

5 comments22 min readLW link

(thezvi.wordpress.com)

AI #120: While o3 Turned Pro

Zvi12 Jun 2025 15:30 UTC

51 points

3 comments53 min readLW link

(thezvi.wordpress.com)

AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer

Dan H and Corin Katzke

1 Aug 2023 15:39 UTC

3 points

0 comments6 min readLW link

(newsletter.safe.ai)

GPT-4o Sycophancy Post Mortem

Zvi5 May 2025 16:00 UTC

55 points

1 comment16 min readLW link

(thezvi.wordpress.com)

Stargate AI-1

Zvi24 Jan 2025 15:20 UTC

85 points

1 comment18 min readLW link

(thezvi.wordpress.com)

EA & LW Forums Weekly Summary (31st Oct − 6th Nov 22′)

Zoe Williams8 Nov 2022 3:58 UTC

12 points

1 comment18 min readLW link

AI #58: Stargate AGI

Zvi4 Apr 2024 13:10 UTC

49 points

9 comments60 min readLW link

(thezvi.wordpress.com)

Summaries of top forum posts (24th − 30th April 2023)

Zoe Williams2 May 2023 2:30 UTC

12 points

1 comment10 min readLW link

EA & LW Forum Weekly Summary (27th Feb − 5th Mar 2023)

Zoe Williams6 Mar 2023 3:18 UTC

12 points

0 comments11 min readLW link

Dwarkesh Patel on Continual Learning

Zvi9 Jun 2025 14:50 UTC

34 points

1 comment20 min readLW link

(thezvi.wordpress.com)

Medical Roundup #3

Zvi9 Jul 2024 13:10 UTC

39 points

4 comments19 min readLW link

(thezvi.wordpress.com)

AISN #55: Trump Administration Rescinds AI Diffusion Rule, Allows Chip Sales to Gulf States

Corin Katzke and Dan H

20 May 2025 16:21 UTC

6 points

1 comment4 min readLW link

(forum.effectivealtruism.org)

Global Risks Weekly Roundup #18/2025: US tariff shortages, military policing, Gaza famine.

NunoSempere6 May 2025 10:39 UTC

31 points

2 comments3 min readLW link

(blog.sentinel-team.org)

GPT-4o Is An Absurd Sycophant

Zvi28 Apr 2025 19:00 UTC

81 points

7 comments19 min readLW link

(thezvi.wordpress.com)

EA & LW Forum Weekly Summary (6th − 12th March 2023)

Zoe Williams14 Mar 2023 3:01 UTC

7 points

0 comments12 min readLW link

Give Me a Reason(ing Model)

Zvi10 Jun 2025 15:10 UTC

55 points

6 comments5 min readLW link

(thezvi.wordpress.com)

EA & LW Forums Weekly Summary (14th Nov − 27th Nov 22′)

Zoe Williams29 Nov 2022 23:00 UTC

21 points

1 comment20 min readLW link

[AN #118]: Risks, solutions, and prioritization in a world with many AI systems

Rohin Shah23 Sep 2020 18:20 UTC

15 points

6 comments10 min readLW link

(mailchi.mp)

[MLSN #6]: Transparency survey, provable robustness, ML models that predict the future

Dan H12 Oct 2022 20:56 UTC

27 points

0 comments6 min readLW link

AI Safety Newsletter #1 [CAIS Linkpost]

Orpheus16, Dan H and ozhang

10 Apr 2023 20:18 UTC

45 points

0 comments4 min readLW link

(newsletter.safe.ai)

[AN #129]: Explaining double descent by measuring bias and variance

Rohin Shah16 Dec 2020 18:10 UTC

14 points

1 comment7 min readLW link

(mailchi.mp)

AI #10: Code Interpreter and Geoff Hinton

Zvi4 May 2023 14:00 UTC

80 points

7 comments78 min readLW link

(thezvi.wordpress.com)

AI #14: A Very Good Sentence

Zvi1 Jun 2023 21:30 UTC

118 points

30 comments65 min readLW link

(thezvi.wordpress.com)

AI #75: Math is Easier

Zvi1 Aug 2024 13:40 UTC

46 points

25 comments72 min readLW link

(thezvi.wordpress.com)

EA & LW Forums Weekly Summary (7th Nov − 13th Nov 22′)

Zoe Williams16 Nov 2022 3:04 UTC

19 points

0 comments14 min readLW link

AI #24: Week of the Podcast

Zvi10 Aug 2023 15:00 UTC

49 points

5 comments44 min readLW link

(thezvi.wordpress.com)

AI #103: Show Me the Money

Zvi13 Feb 2025 15:20 UTC

30 points

9 comments58 min readLW link

(thezvi.wordpress.com)

October 2024 Progress in Guaranteed Safe AI

Quinn28 Oct 2024 23:34 UTC

7 points

0 comments1 min readLW link

(gsai.substack.com)

Progress links and tweets, 2023-07-20: “A goddess enthroned on a car”

jasoncrawford20 Jul 2023 18:28 UTC

12 points

4 comments2 min readLW link

(rootsofprogress.org)

AI #100: Meet the New Boss

Zvi23 Jan 2025 15:40 UTC

50 points

4 comments69 min readLW link

(thezvi.wordpress.com)

AI #86: Just Think of the Potential

Zvi17 Oct 2024 15:10 UTC

58 points

8 comments57 min readLW link

(thezvi.wordpress.com)

AI #13: Potential Algorithmic Improvements

Zvi25 May 2023 15:40 UTC

45 points

4 comments67 min readLW link

(thezvi.wordpress.com)

AI #121 Part 2: The OpenAI Files

Zvi20 Jun 2025 14:50 UTC

37 points

9 comments41 min readLW link

(thezvi.wordpress.com)

Navigating AI Risks (NAIR) #1: Slowing Down AI

simeon_c14 Apr 2023 14:35 UTC

11 points

3 comments1 min readLW link

(navigatingairisks.substack.com)

[AN #112]: Engineering a Safer World

Rohin Shah13 Aug 2020 17:20 UTC

26 points

2 comments12 min readLW link

(mailchi.mp)

o3 Turns Pro

Zvi17 Jun 2025 13:50 UTC

30 points

1 comment14 min readLW link

(thezvi.wordpress.com)

AISN #13: An interdisciplinary perspective on AI proxy failures, new competitors to ChatGPT, and prompting language models to misbehave

Dan H5 Jul 2023 15:33 UTC

13 points

0 comments9 min readLW link

(newsletter.safe.ai)

[AN #167]: Concrete ML safety problems and their relevance to x-risk

Rohin Shah20 Oct 2021 17:10 UTC

21 points

4 comments9 min readLW link

(mailchi.mp)

AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes

Dan H and Corin Katzke

24 Jan 2024 19:38 UTC

27 points

1 comment6 min readLW link

(newsletter.safe.ai)

AI Safety − 7 months of discussion in 17 minutes

Zoe Williams15 Mar 2023 23:41 UTC

25 points

0 comments17 min readLW link

AISN#14: OpenAI’s ‘Superalignment’ team, Musk’s xAI launches, and developments in military AI use

Dan H12 Jul 2023 16:58 UTC

16 points

0 comments4 min readLW link

(newsletter.safe.ai)

AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs Plus, Forecasting the Future with LLMs, and Regulatory Markets

Corin Katzke and Dan H

7 Mar 2024 16:39 UTC

8 points

0 comments8 min readLW link

(newsletter.safe.ai)

European Links (18.05.25)

Martin Sustrik18 May 2025 4:20 UTC

16 points

5 comments2 min readLW link

(250bpm.substack.com)

AI Safety Newsletter #8: Rogue AIs, how to screen for AI risks, and grants for research on democratic governance of AI

Dan H and Orpheus16

30 May 2023 11:52 UTC

20 points

0 comments6 min readLW link

(newsletter.safe.ai)

[Question] What AI newsletters or substacks about AI do you recommend?

wunan25 Nov 2022 19:29 UTC

6 points

1 comment1 min readLW link

Childhood and Education #8: Dealing with the Internet

Zvi6 Jan 2025 14:00 UTC

37 points

7 comments13 min readLW link

(thezvi.wordpress.com)

AI #115: The Evil Applications Division

Zvi8 May 2025 13:40 UTC

32 points

3 comments62 min readLW link

(thezvi.wordpress.com)

Dating Roundup #5: Opening Day

Zvi27 May 2025 13:10 UTC

27 points

8 comments27 min readLW link

(thezvi.wordpress.com)

EA & LW Forum Weekly Summary (30th Jan − 5th Feb 2023)

Zoe Williams7 Feb 2023 2:13 UTC

3 points

3 comments14 min readLW link

On Dwarksh’s Podcast with Leopold Aschenbrenner

Zvi10 Jun 2024 12:40 UTC

102 points

7 comments59 min readLW link

(thezvi.wordpress.com)

AI Safety at the Frontier: Paper Highlights, January ’25

gasteigerjo11 Feb 2025 16:14 UTC

7 points

0 comments8 min readLW link

(aisafetyfrontier.substack.com)

AI #114: Liars, Sycophants and Cheaters

Zvi1 May 2025 14:00 UTC

40 points

6 comments63 min readLW link

(thezvi.wordpress.com)

Alignment Newsletter #47

Rohin Shah4 Mar 2019 4:30 UTC

18 points

0 comments8 min readLW link

(mailchi.mp)

AI #132 Part 1: Improved AI Detection

Zvi4 Sep 2025 15:31 UTC

33 points

4 comments32 min readLW link

(thezvi.wordpress.com)

Forecasting newsletter #2/2025: Forecasting meetup network

NunoSempere9 Feb 2025 18:07 UTC

13 points

0 comments4 min readLW link

(forecasting.substack.com)

Progress links and tweets, 2023-05-16

jasoncrawford16 May 2023 20:54 UTC

14 points

0 comments1 min readLW link

(rootsofprogress.org)

EA & LW Forums Weekly Summary (10 − 16 Oct 22′)

Zoe Williams17 Oct 2022 22:51 UTC

12 points

4 comments16 min readLW link

Forecasting Newsletter: April 2021

NunoSempere1 May 2021 16:07 UTC

9 points

0 comments10 min readLW link

EA & LW Forum Weekly Summary (23rd − 29th Jan ’23)

Zoe Williams31 Jan 2023 0:36 UTC

12 points

0 comments13 min readLW link

Monthly Roundup #31: June 2025

Zvi13 Jun 2025 16:20 UTC

37 points

3 comments50 min readLW link

(thezvi.wordpress.com)

DeepSeek v3.1 Is Not Having a Moment

Zvi22 Aug 2025 15:50 UTC

40 points

2 comments3 min readLW link

(thezvi.wordpress.com)

AISN #18: Challenges of Reinforcement Learning from Human Feedback, Microsoft’s Security Breach, and Conceptual Research on AI Safety

Dan H8 Aug 2023 15:52 UTC

13 points

0 comments5 min readLW link

(newsletter.safe.ai)

ML Safety at NeurIPS & Paradigmatic AI Safety? MLAISU W49

Esben Kran and Steinthal

9 Dec 2022 10:38 UTC

19 points

0 comments4 min readLW link

(newsletter.apartresearch.com)

Startup Roundup #2

Zvi6 Aug 2024 13:30 UTC

45 points

0 comments32 min readLW link

(thezvi.wordpress.com)

EA & LW Forum Summaries (9th Jan to 15th Jan 23′)

Zoe Williams18 Jan 2023 7:29 UTC

17 points

0 comments13 min readLW link

AI Safety at the Frontier: Paper Highlights, December ’24

gasteigerjo11 Jan 2025 22:54 UTC

7 points

2 comments7 min readLW link

(aisafetyfrontier.substack.com)

On DeepSeek’s r1

Zvi22 Jan 2025 19:50 UTC

55 points

2 comments35 min readLW link

(thezvi.wordpress.com)

AI #99: Farewell to Biden

Zvi16 Jan 2025 14:20 UTC

54 points

5 comments58 min readLW link

(thezvi.wordpress.com)

AI #116: If Anyone Builds It, Everyone Dies

Zvi15 May 2025 15:10 UTC

47 points

5 comments42 min readLW link

(thezvi.wordpress.com)

AI #22: Into the Weeds

Zvi27 Jul 2023 17:40 UTC

49 points

8 comments84 min readLW link

(thezvi.wordpress.com)

AISN #29: Progress on the EU AI Act Plus, the NY Times sues OpenAI for Copyright Infringement, and Congressional Questions about Research Standards in AI Safety

Dan H and Corin Katzke

4 Jan 2024 16:09 UTC

8 points

0 comments6 min readLW link

(newsletter.safe.ai)

DeepSeek: Lemon, It’s Wednesday

Zvi29 Jan 2025 15:00 UTC

33 points

0 comments33 min readLW link

(thezvi.wordpress.com)

AI #11: In Search of a Moat

Zvi11 May 2023 15:40 UTC

67 points

28 comments81 min readLW link

(thezvi.wordpress.com)

EA & LW Forum Weekly Summary (16th − 22nd Jan ’23)

Zoe Williams23 Jan 2023 3:46 UTC

13 points

0 comments9 min readLW link

US credit rating downgraded, $1T in Gulf state investments in the US, Kurdistan Workers’ Party disbanded | Sentinel Global Risks Weekly Roundup #20/2025

NunoSempere19 May 2025 17:59 UTC

22 points

0 comments10 min readLW link

(blog.sentinel-team.org)

AISN #24: Kissinger Urges US-China Cooperation on AI, China’s New AI Law, US Export Controls, International Institutions, and Open Source AI

Dan H and Corin Katzke

18 Oct 2023 17:06 UTC

14 points

0 comments6 min readLW link

(newsletter.safe.ai)

AI #41: Bring in the Other Gemini

Zvi7 Dec 2023 15:10 UTC

46 points

16 comments52 min readLW link

(thezvi.wordpress.com)

AI #74: GPT-4o Mini Me and Llama 3

Zvi25 Jul 2024 13:50 UTC

30 points

6 comments36 min readLW link

(thezvi.wordpress.com)

Monthly Roundup #30: May 2025

Zvi13 May 2025 14:10 UTC

14 points

2 comments38 min readLW link

(thezvi.wordpress.com)

AI #18: The Great Debate Debate

Zvi29 Jun 2023 16:20 UTC

47 points

9 comments52 min readLW link

(thezvi.wordpress.com)

AI Safety Newsletter #4: AI and Cybersecurity, Persuasive AIs, Weaponization, and Geoffrey Hinton talks AI risks

ozhang, Dan H and Orpheus16

2 May 2023 18:41 UTC

32 points

0 comments5 min readLW link

(newsletter.safe.ai)

Claude 4 You: Safety and Alignment

Zvi25 May 2025 14:00 UTC

86 points

8 comments63 min readLW link

(thezvi.wordpress.com)

Occupational Licensing Roundup #1

Zvi30 Oct 2024 11:00 UTC

65 points

11 comments11 min readLW link

(thezvi.wordpress.com)

Monthly Roundup #25: December 2024

Zvi23 Dec 2024 14:20 UTC

18 points

3 comments26 min readLW link

(thezvi.wordpress.com)

AISN #31: A New AI Policy Bill in California Plus, Precedents for AI Governance and The EU AI Office

Dan H21 Feb 2024 21:58 UTC

17 points

0 comments6 min readLW link

(newsletter.safe.ai)

AISN #25: White House Executive Order on AI, UK AI Safety Summit, and Progress on Voluntary Evaluations of AI Risks

Dan H31 Oct 2023 19:34 UTC

35 points

1 comment6 min readLW link

(newsletter.safe.ai)

EA & LW Forum Weekly Summary (13th − 19th March 2023)

Zoe Williams20 Mar 2023 4:18 UTC

13 points

0 comments14 min readLW link

AI #9: The Merge and the Million Tokens

Zvi27 Apr 2023 14:20 UTC

36 points

8 comments53 min readLW link

(thezvi.wordpress.com)

AI #88: Thanks for the Memos

Zvi31 Oct 2024 15:00 UTC

46 points

5 comments77 min readLW link

(thezvi.wordpress.com)

Housing Roundup #10

Zvi29 Oct 2024 13:50 UTC

32 points

2 comments32 min readLW link

(thezvi.wordpress.com)

AI Alignment [Incremental Progress Units] this Week (10/22/23)

Logan Zoellner23 Oct 2023 20:32 UTC

22 points

0 comments6 min readLW link

(midwitalignment.substack.com)

Summaries of top forum posts (1st to 7th May 2023)

Zoe Williams9 May 2023 9:30 UTC

21 points

0 comments11 min readLW link

[AN #145]: Our three year anniversary!

Rohin Shah9 Apr 2021 17:48 UTC

19 points

0 comments8 min readLW link

(mailchi.mp)

In Which I Make the Mistake of Fully Covering an Episode of the All-In Podcast

Zvi3 Jun 2025 15:50 UTC

42 points

2 comments28 min readLW link

(thezvi.wordpress.com)

AISN #17: Automatically Circumventing LLM Guardrails, the Frontier Model Forum, and Senate Hearing on AI Oversight

Dan H1 Aug 2023 15:40 UTC

8 points

0 comments8 min readLW link

(newsletter.safe.ai)

AI #105: Hey There Alexa

Zvi27 Feb 2025 14:30 UTC

31 points

3 comments40 min readLW link

(thezvi.wordpress.com)

AI Safety at the Frontier: Paper Highlights, April ’25

gasteigerjo6 May 2025 14:22 UTC

4 points

0 comments7 min readLW link

(aisafetyfrontier.substack.com)

AI #119: Goodbye AISI?

Zvi5 Jun 2025 14:00 UTC

42 points

8 comments60 min readLW link

(thezvi.wordpress.com)

On Dwarkesh Patel’s 4th Podcast With Tyler Cowen

Zvi10 Jan 2025 13:50 UTC

44 points

7 comments27 min readLW link

(thezvi.wordpress.com)

Forecasting Newsletter: January 2022

NunoSempere3 Feb 2022 19:22 UTC

17 points

0 comments6 min readLW link

OpenAI #10: Reflections

Zvi7 Jan 2025 17:00 UTC

149 points

7 comments11 min readLW link

(thezvi.wordpress.com)

AI #117: OpenAI Buys Device Maker IO

Zvi22 May 2025 13:40 UTC

37 points

9 comments62 min readLW link

(thezvi.wordpress.com)

AI #110: Of Course You Know…

Zvi3 Apr 2025 13:10 UTC

51 points

9 comments44 min readLW link

(thezvi.wordpress.com)

[MLSN #9] Verifying large training runs, security risks from LLM access to APIs, why natural selection may favor AIs over humans

Dan H and TW123

11 Apr 2023 16:03 UTC

11 points

0 comments6 min readLW link

(newsletter.mlsafety.org)

Progress links digest, 2023-08-09: US adds new nuclear, Katalin Karikó interview, and more

jasoncrawford9 Aug 2023 19:22 UTC

18 points

0 comments3 min readLW link

(rootsofprogress.org)

Quintin’s alignment papers roundup—week 2

Quintin Pope19 Sep 2022 13:41 UTC

67 points

2 comments10 min readLW link

AISN #27: Defensive Accelerationism, A Retrospective On The OpenAI Board Saga, And A New AI Bill From Senators Thune And Klobuchar

Dan H, Corin Katzke and allison huang

7 Dec 2023 15:59 UTC

13 points

0 comments6 min readLW link

(newsletter.safe.ai)

AI Safety Newsletter #5: Geoffrey Hinton speaks out on AI risk, the White House meets with AI labs, and Trojan attacks on language models

Dan H and Orpheus16

9 May 2023 15:26 UTC

28 points

1 comment4 min readLW link

(newsletter.safe.ai)

AI Safety at the Frontier: Paper Highlights, October ’24

gasteigerjo31 Oct 2024 0:09 UTC

3 points

0 comments9 min readLW link

(aisafetyfrontier.substack.com)

AISN #34: New Military AI Systems Plus, AI Labs Fail to Uphold Voluntary Commitments to UK AI Safety Institute, and New AI Policy Proposals in the US Senate

Corin Katzke and Dan H

2 May 2024 16:12 UTC

6 points

0 comments8 min readLW link

(newsletter.safe.ai)

European Links (30.04.25)

Martin Sustrik30 Apr 2025 15:40 UTC

15 points

1 comment8 min readLW link

(250bpm.substack.com)

AI #6: Agents of Change

Zvi6 Apr 2023 14:00 UTC

79 points

13 comments47 min readLW link

(thezvi.wordpress.com)

DeepSeek-r1-0528 Did Not Have a Moment

Zvi6 Jun 2025 15:40 UTC

30 points

2 comments15 min readLW link

(thezvi.wordpress.com)

AI #98: World Ends With Six Word Story

Zvi9 Jan 2025 16:30 UTC

36 points

2 comments38 min readLW link

(thezvi.wordpress.com)

[AN #170]: Analyzing the argument for risk from power-seeking AI

Rohin Shah8 Dec 2021 18:10 UTC

21 points

1 comment7 min readLW link

(mailchi.mp)

Hiatus: EA and LW post summaries

Zoe Williams17 May 2023 17:17 UTC

14 points

0 comments1 min readLW link

Global Risks Weekly Roundup #19/2025: India/Pakistan ceasefire, US/China tariffs deal & OpenAI nonprofit control

NunoSempere12 May 2025 17:08 UTC

10 points

1 comment13 min readLW link

(blog.sentinel-team.org)

Monthly Roundup #34: September 2025

Zvi15 Sep 2025 12:30 UTC

42 points

4 comments53 min readLW link

(thezvi.wordpress.com)

EA & LW Forums Weekly Summary (24 − 30th Oct 22′)

Zoe Williams1 Nov 2022 2:58 UTC

13 points

1 comment14 min readLW link

o3-mini Early Days

Zvi3 Feb 2025 14:20 UTC

45 points

0 comments15 min readLW link

(thezvi.wordpress.com)

Forecasting Newsletter: February 2022

NunoSempere5 Mar 2022 19:30 UTC

36 points

0 comments9 min readLW link

AISN #26: National Institutions for AI Safety, Results From the UK Summit, and New Releases From OpenAI and xAI

Corin Katzke, allison huang and Dan H

15 Nov 2023 16:07 UTC

13 points

0 comments6 min readLW link

(newsletter.safe.ai)

Operator

Zvi28 Jan 2025 20:00 UTC

35 points

1 comment11 min readLW link

(thezvi.wordpress.com)

AI #132 Part 2: Actively Making It Worse

Zvi5 Sep 2025 11:50 UTC

28 points

9 comments28 min readLW link

(thezvi.wordpress.com)

AI Safety at the Frontier: Paper Highlights, May ’25

gasteigerjo17 Jun 2025 17:16 UTC

6 points

0 comments8 min readLW link

(aisafetyfrontier.substack.com)

AISN #23: New OpenAI Models, News from Anthropic, and Representation Engineering

Dan H4 Oct 2023 17:37 UTC

15 points

2 comments5 min readLW link

(newsletter.safe.ai)

Gemini 2.5 Pro: From 0506 to 0605

Zvi18 Jun 2025 19:10 UTC

33 points

0 comments8 min readLW link

(thezvi.wordpress.com)

AI #83: The Mask Comes Off

Zvi26 Sep 2024 12:00 UTC

82 points

20 comments36 min readLW link

(thezvi.wordpress.com)

AI #118: Claude Ascendant

Zvi29 May 2025 14:10 UTC

45 points

8 comments57 min readLW link

(thezvi.wordpress.com)

AI #59: Model Updates

Zvi11 Apr 2024 14:20 UTC

30 points

2 comments63 min readLW link

(thezvi.wordpress.com)

[AN #173] Recent language model results from DeepMind

Rohin Shah21 Jul 2022 2:30 UTC

37 points

9 comments8 min readLW link

(mailchi.mp)

AISN#15: China and the US take action to regulate AI, results from a tournament forecasting AI risk, updates on xAI’s plan, and Meta releases its open-source and commercially available Llama 2

Corin Katzke and Dan H

19 Jul 2023 13:01 UTC

16 points

0 comments6 min readLW link

(newsletter.safe.ai)

AI Safety Newsletter #7: Disinformation, Governance Recommendations for AI labs, and Senate Hearings on AI

Dan H and Orpheus16

23 May 2023 21:47 UTC

25 points

0 comments6 min readLW link

(newsletter.safe.ai)

AI #102: Made in America

Zvi6 Feb 2025 14:20 UTC

26 points

18 comments67 min readLW link

(thezvi.wordpress.com)

EA & LW Forums Weekly Summary (5 − 11 Sep 22′)

Zoe Williams12 Sep 2022 23:24 UTC

24 points

0 comments13 min readLW link

AI #101: The Shallow End

Zvi30 Jan 2025 14:50 UTC

39 points

1 comment59 min readLW link

(thezvi.wordpress.com)

AI #72: Denying the Future

Zvi11 Jul 2024 15:00 UTC

45 points

8 comments41 min readLW link

(thezvi.wordpress.com)

AISN #50: AI Action Plan Responses

Corin Katzke and Dan H

31 Mar 2025 20:13 UTC

6 points

0 comments6 min readLW link

(newsletter.safe.ai)

Dec 2019 gwern.net newsletter

gwern4 Jan 2020 20:48 UTC

17 points

2 comments1 min readLW link

(www.gwern.net)

AI improving AI [MLAISU W01!]

Esben Kran6 Jan 2023 11:13 UTC

5 points

0 comments4 min readLW link

(newsletter.apartresearch.com)

AISN #54: OpenAI Updates Restructure Plan

Corin Katzke and Dan H

13 May 2025 16:59 UTC

8 points

1 comment4 min readLW link

(newsletter.safe.ai)

Rationality Feed: Last Month’s Best Posts

sapphire21 Mar 2018 14:12 UTC

20 points

2 comments2 min readLW link

AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics

Corin Katzke, Corin Katzke, Julius, andrewz and Dan H

11 Sep 2024 19:14 UTC

5 points

1 comment5 min readLW link

(newsletter.safe.ai)

Recent updates to gwern.net (2014-2015)

gwern2 Nov 2015 0:06 UTC

34 points

3 comments3 min readLW link

[AN #113]: Checking the ethical intuitions of large language models

Rohin Shah19 Aug 2020 17:10 UTC

23 points

0 comments9 min readLW link

(mailchi.mp)

Alignment Newsletter #52

Rohin Shah6 Apr 2019 1:20 UTC

19 points

1 comment8 min readLW link

(mailchi.mp)

Bi-weekly Rational Feed

sapphire8 Aug 2017 13:56 UTC

29 points

4 comments13 min readLW link

Alignment Newsletter #51

Rohin Shah3 Apr 2019 4:10 UTC

25 points

2 comments15 min readLW link

(mailchi.mp)

AISN #57: The RAISE Act

Corin Katzke and Dan H

17 Jun 2025 18:02 UTC

6 points

0 comments3 min readLW link

(newsletter.safe.ai)

[AN #61] AI policy and governance, from two people in the field

Rohin Shah5 Aug 2019 17:00 UTC

12 points

2 comments9 min readLW link

(mailchi.mp)

The Alignment Newsletter #12: 06/25/18

Rohin Shah25 Jun 2018 16:00 UTC

15 points

0 comments3 min readLW link

AISN #63: California’s SB-53 Passes the Legislature

Corin Katzke and Dan H

24 Sep 2025 17:02 UTC

6 points

0 comments4 min readLW link

(newsletter.safe.ai)

[AN #94]: AI alignment as translation between humans and machines

Rohin Shah8 Apr 2020 17:10 UTC

11 points

0 comments7 min readLW link

(mailchi.mp)

Alignment Newsletter #29

Rohin Shah22 Oct 2018 16:20 UTC

15 points

0 comments9 min readLW link

(mailchi.mp)

Alignment Newsletter #35

Rohin Shah4 Dec 2018 1:10 UTC

15 points

0 comments6 min readLW link

(mailchi.mp)

Alignment Newsletter #49

Rohin Shah20 Mar 2019 4:20 UTC

23 points

1 comment11 min readLW link

(mailchi.mp)

March 2019 gwern.net newsletter

gwern2 Apr 2019 14:17 UTC

19 points

9 comments1 min readLW link

(www.gwern.net)

Announcing Rational Newsletter

Alexey Lapitsky1 Apr 2018 14:37 UTC

10 points

9 comments1 min readLW link

July 2019 gwern.net newsletter

gwern1 Aug 2019 16:19 UTC

23 points

0 comments1 min readLW link

(www.gwern.net)

[AN #78] Formalizing power and instrumental convergence, and the end-of-year AI safety charity comparison

Rohin Shah26 Dec 2019 1:10 UTC

26 points

10 comments9 min readLW link

(mailchi.mp)

AISN #38: Supreme Court Decision Could Limit Federal Ability to Regulate AI Plus, “Circuit Breakers” for AI systems, and updates on China’s AI industry

Corin Katzke, Alexa Pan, Julius and Dan H

9 Jul 2024 19:28 UTC

5 points

0 comments5 min readLW link

(newsletter.safe.ai)

Call for contributors to the Alignment Newsletter

Rohin Shah21 Aug 2019 18:21 UTC

39 points

0 comments4 min readLW link

AI Safety Newsletter #37: US Launches Antitrust Investigations Plus, recent criticisms of OpenAI and Anthropic, and a summary of Situational Awareness

Corin Katzke, Alexa Pan, Julius and Dan H

18 Jun 2024 18:07 UTC

8 points

0 comments5 min readLW link

(newsletter.safe.ai)

Alignment Newsletter #24

Rohin Shah17 Sep 2018 16:20 UTC

10 points

6 comments12 min readLW link

(mailchi.mp)

Launching Adjacent News

Lucas Kohorst16 Oct 2024 17:58 UTC

24 points

0 comments4 min readLW link

AI #27: Portents of Gemini

Zvi31 Aug 2023 12:40 UTC

54 points

37 comments47 min readLW link

(thezvi.wordpress.com)

[AN #57] Why we should focus on robustness in AI safety, and the analogous problems in programming

Rohin Shah5 Jun 2019 23:20 UTC

26 points

15 comments7 min readLW link

(mailchi.mp)

Weekly newsletter for AI safety events and training programs

Bryce Robertson3 May 2024 0:33 UTC

29 points

0 comments1 min readLW link

[AN #63] How architecture search, meta learning, and environment design could lead to general intelligence

Rohin Shah10 Sep 2019 19:10 UTC

21 points

12 comments8 min readLW link

(mailchi.mp)

EA & LW Forums Weekly Summary (12th Dec − 18th Dec 22′)

Zoe Williams20 Dec 2022 9:49 UTC

10 points

0 comments17 min readLW link

AISN #51: AI Frontiers

Corin Katzke and Dan H

15 Apr 2025 16:01 UTC

8 points

1 comment5 min readLW link

(newsletter.safe.ai)

Alignment Newsletter #23

Rohin Shah10 Sep 2018 17:10 UTC

16 points

0 comments7 min readLW link

(mailchi.mp)

The Alignment Newsletter #4: 04/30/18

Rohin Shah30 Apr 2018 16:00 UTC

8 points

0 comments3 min readLW link

The Alignment Newsletter #5: 05/07/18

Rohin Shah7 May 2018 16:00 UTC

8 points

0 comments7 min readLW link

AISN #21: Google DeepMind’s GPT-4 Competitor, Military Investments in Autonomous Drones, The UK AI Safety Summit, and Case Studies in AI Policy

Dan H5 Sep 2023 15:03 UTC

15 points

0 comments5 min readLW link

(newsletter.safe.ai)

Forecasting Newsletter: April 2020

NunoSempere30 Apr 2020 16:41 UTC

22 points

3 comments6 min readLW link

What I’ve been reading, November 2023

jasoncrawford7 Nov 2023 13:37 UTC

23 points

1 comment5 min readLW link

(rootsofprogress.org)

[AN #105]: The economic trajectory of humanity, and what we might mean by optimization

Rohin Shah24 Jun 2020 17:30 UTC

24 points

3 comments11 min readLW link

(mailchi.mp)

[AN #98]: Understanding neural net training by seeing which gradients were helpful

Rohin Shah6 May 2020 17:10 UTC

22 points

3 comments9 min readLW link

(mailchi.mp)

The Alignment Newsletter #3: 04/23/18

Rohin Shah23 Apr 2018 16:00 UTC

9 points

0 comments6 min readLW link

[AN #66]: Decomposing robustness into capability robustness and alignment robustness

Rohin Shah30 Sep 2019 18:00 UTC

12 points

1 comment7 min readLW link

(mailchi.mp)

Alignment Newsletter #30

Rohin Shah29 Oct 2018 16:10 UTC

29 points

2 comments6 min readLW link

(mailchi.mp)

November 2020 gwern.net newsletter

gwern3 Dec 2020 22:47 UTC

14 points

5 comments1 min readLW link

(www.gwern.net)

Recent updates to gwern.net (2013-2014)

gwern8 Jul 2014 1:44 UTC

38 points

32 comments4 min readLW link

Alignment Newsletter #34

Rohin Shah26 Nov 2018 23:10 UTC

24 points

0 comments10 min readLW link

(mailchi.mp)

[AN #116]: How to make explanations of neurons compositional

Rohin Shah9 Sep 2020 17:20 UTC

21 points

2 comments9 min readLW link

(mailchi.mp)

MLSN: #10 Adversarial Attacks Against Language and Vision Models, Improving LLM Honesty, and Tracing the Influence of LLM Training Data

aog and Dan H

13 Sep 2023 18:03 UTC

15 points

1 comment5 min readLW link

(newsletter.mlsafety.org)

Robustness & Evolution [MLAISU W02]

Esben Kran13 Jan 2023 15:47 UTC

10 points

0 comments3 min readLW link

(newsletter.apartresearch.com)

AISN #49: Superintelligence Strategy

Corin Katzke and Dan H

6 Mar 2025 17:46 UTC

6 points

1 comment5 min readLW link

(newsletter.safe.ai)

Alignment Newsletter #42

Rohin Shah22 Jan 2019 2:00 UTC

20 points

1 comment10 min readLW link

(mailchi.mp)

Announcing LessWrong Digest

Evan_Gaensbauer23 Feb 2015 10:41 UTC

35 points

18 comments1 min readLW link

Alignment Newsletter #17

Rohin Shah30 Jul 2018 16:10 UTC

32 points

0 comments13 min readLW link

(mailchi.mp)

The Alignment Newsletter #1: 04/09/18

Rohin Shah9 Apr 2018 16:00 UTC

12 points

3 comments4 min readLW link

January 2019 gwern.net newsletter

gwern4 Feb 2019 15:53 UTC

15 points

0 comments1 min readLW link

(www.gwern.net)

AI #122: Paying The Market Price

Zvi26 Jun 2025 18:10 UTC

36 points

2 comments40 min readLW link

(thezvi.wordpress.com)

EA & LW Forums Weekly Summary (26 Sep − 9 Oct 22′)

Zoe Williams10 Oct 2022 23:58 UTC

13 points

2 comments14 min readLW link

Forecasting Newsletter: May 2020.

NunoSempere31 May 2020 12:35 UTC

9 points

1 comment20 min readLW link

July 2020 gwern.net newsletter

gwern20 Aug 2020 16:39 UTC

29 points

0 comments1 min readLW link

(www.gwern.net)

[AN #84] Reviewing AI alignment work in 2018-19

Rohin Shah29 Jan 2020 18:30 UTC

23 points

0 comments6 min readLW link

(mailchi.mp)

[AN #58] Mesa optimization: what it is, and why we should care

Rohin Shah24 Jun 2019 16:10 UTC

55 points

10 comments8 min readLW link

(mailchi.mp)

Rational Feed: Last Month’s Best Posts

sapphire2 May 2018 18:19 UTC

16 points

0 comments2 min readLW link

AISN #59: EU Publishes General-Purpose AI Code of Practice

Corin Katzke and Dan H

15 Jul 2025 18:59 UTC

10 points

0 comments4 min readLW link

(aisafety.substack.com)

Alignment Newsletter #13: 07/02/18

Rohin Shah2 Jul 2018 16:10 UTC

70 points

12 comments8 min readLW link

(mailchi.mp)

[AN #59] How arguments for AI risk have changed over time

Rohin Shah8 Jul 2019 17:20 UTC

43 points

4 comments7 min readLW link

(mailchi.mp)

[AN #133]: Building machines that can cooperate (with humans, institutions, or other machines)

Rohin Shah13 Jan 2021 18:10 UTC

14 points

0 comments9 min readLW link

(mailchi.mp)

Newsletter for Alignment Research: The ML Safety Updates

Esben Kran22 Oct 2022 16:17 UTC

26 points

0 comments7 min readLW link

[AN #86]: Improving debate and factored cognition through human experiments

Rohin Shah12 Feb 2020 18:10 UTC

15 points

0 comments9 min readLW link

(mailchi.mp)

[AN #71]: Avoiding reward tampering through current-RF optimization

Rohin Shah30 Oct 2019 17:10 UTC

12 points

0 comments7 min readLW link

(mailchi.mp)

May gwern.net newsletter

gwern1 Jun 2018 14:47 UTC

24 points

3 comments1 min readLW link

(www.gwern.net)

[AN #81]: Universality as a potential solution to conceptual difficulties in intent alignment

Rohin Shah8 Jan 2020 18:00 UTC

32 points

4 comments11 min readLW link

(mailchi.mp)

AISN #22: The Landscape of US AI Legislation - Hearings, Frameworks, Bills, and Laws

Dan H19 Sep 2023 14:44 UTC

20 points

0 comments5 min readLW link

(newsletter.safe.ai)

AISN #45: Center for AI Safety 2024 Year in Review

Corin Katzke and Dan H

19 Dec 2024 18:15 UTC

13 points

0 comments4 min readLW link

(newsletter.safe.ai)

AISN #60: The AI Action Plan

Corin Katzke and Dan H

31 Jul 2025 18:20 UTC

6 points

0 comments4 min readLW link

(newsletter.safe.ai)

Alignment Newsletter #20

Rohin Shah20 Aug 2018 16:00 UTC

12 points

2 comments6 min readLW link

(mailchi.mp)

[AN #97]: Are there historical examples of large, robust discontinuities?

Rohin Shah29 Apr 2020 17:30 UTC

15 points

0 comments10 min readLW link

(mailchi.mp)

Bi-Weekly Rational Feed

sapphire24 Jun 2017 0:07 UTC

35 points

3 comments12 min readLW link

[AN #125]: Neural network scaling laws across multiple modalities

Rohin Shah11 Nov 2020 18:20 UTC

25 points

7 comments9 min readLW link

(mailchi.mp)

[AN #67]: Creating environments in which to study inner alignment failures

Rohin Shah7 Oct 2019 17:10 UTC

17 points

0 comments8 min readLW link

(mailchi.mp)

[AN #109]: Teaching neural nets to generalize the way humans would

Rohin Shah22 Jul 2020 17:10 UTC

17 points

3 comments9 min readLW link

(mailchi.mp)

Alignment Newsletter #38

Rohin Shah25 Dec 2018 16:10 UTC

9 points

0 comments8 min readLW link

(mailchi.mp)

[AN #85]: The normative questions we should be asking for AI alignment, and a surprisingly good chatbot

Rohin Shah5 Feb 2020 18:20 UTC

14 points

2 comments7 min readLW link

(mailchi.mp)

AISN #53: An Open Letter Attempts to Block OpenAI Restructuring

Corin Katzke and Dan H

29 Apr 2025 16:13 UTC

7 points

0 comments4 min readLW link

June 2019 gwern.net newsletter

gwern1 Jul 2019 14:35 UTC

29 points

0 comments1 min readLW link

(www.gwern.net)

AISN #47: Reasoning Models

Corin Katzke and Dan H

6 Feb 2025 18:52 UTC

3 points

0 comments4 min readLW link

(newsletter.safe.ai)

Alignment Newsletter #32

Rohin Shah12 Nov 2018 17:20 UTC

18 points

0 comments12 min readLW link

(mailchi.mp)

AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary

Corin Katzke, Corin Katzke, Julius, Alexa Pan, andrewz and Dan H

1 Oct 2024 20:35 UTC

8 points

0 comments6 min readLW link

(newsletter.safe.ai)

[AN #96]: Buck and I discuss/argue about AI Alignment

Rohin Shah22 Apr 2020 17:20 UTC

17 points

4 comments10 min readLW link

(mailchi.mp)

Alignment Newsletter #19

Rohin Shah14 Aug 2018 2:10 UTC

18 points

0 comments13 min readLW link

(mailchi.mp)

Forecasting Newsletter: October 2020.

NunoSempere1 Nov 2020 13:09 UTC

11 points

0 comments4 min readLW link

[AN #64]: Using Deep RL and Reward Uncertainty to Incentivize Preference Learning

Rohin Shah16 Sep 2019 17:10 UTC

11 points

8 comments7 min readLW link

(mailchi.mp)

AISN #20: LLM Proliferation, AI Deception, and Continuing Drivers of AI Capabilities

Dan H29 Aug 2023 15:07 UTC

12 points

0 comments8 min readLW link

(newsletter.safe.ai)

[AN #83]: Sample-efficient deep learning with ReMixMatch

Rohin Shah22 Jan 2020 18:10 UTC

15 points

4 comments11 min readLW link

(mailchi.mp)

Recent updates to gwern.net (2011)

gwern26 Nov 2011 1:58 UTC

45 points

18 comments1 min readLW link

[AN #99]: Doubling times for the efficiency of AI algorithms

Rohin Shah13 May 2020 17:20 UTC

29 points

0 comments10 min readLW link

(mailchi.mp)

AISN#52: An Expert Virology Benchmark

Corin Katzke and Dan H

22 Apr 2025 17:08 UTC

6 points

0 comments4 min readLW link

(newsletter.safe.ai)

[AN #69] Stuart Russell’s new book on why we need to replace the standard model of AI

Rohin Shah19 Oct 2019 0:30 UTC

60 points

12 comments15 min readLW link

(mailchi.mp)

AISN #58: Senate Removes State AI Regulation Moratorium

Corin Katzke and Dan H

3 Jul 2025 17:26 UTC

6 points

0 comments4 min readLW link

(newsletter.safe.ai)

[AN #101]: Why we should rigorously measure and forecast AI progress

Rohin Shah27 May 2020 17:20 UTC

15 points

0 comments10 min readLW link

(mailchi.mp)

[AN #172] Sorry for the long hiatus!

Rohin Shah5 Jul 2022 6:20 UTC

54 points

0 comments3 min readLW link

(mailchi.mp)

[MLSN #7]: an example of an emergent internal optimizer

joshc and Dan H

9 Jan 2023 19:39 UTC

28 points

0 comments6 min readLW link

Alignment Newsletter #41

Rohin Shah17 Jan 2019 8:10 UTC

22 points

6 comments10 min readLW link

(mailchi.mp)

Rationality Feed: Last Month’s Best Posts

sapphire12 Feb 2018 13:18 UTC

23 points

1 comment3 min readLW link

March gwern.net link roundup

gwern20 Apr 2018 19:09 UTC

10 points

1 comment1 min readLW link

(www.gwern.net)

[AN #77]: Double descent: a unification of statistical theory and modern ML practice

Rohin Shah18 Dec 2019 18:30 UTC

21 points

4 comments14 min readLW link

(mailchi.mp)

Will Machines Ever Rule the World? MLAISU W50

Esben Kran16 Dec 2022 11:03 UTC

12 points

7 comments4 min readLW link

(newsletter.apartresearch.com)

[AN #93]: The Precipice we’re standing at, and how we can back away from it

Rohin Shah1 Apr 2020 17:10 UTC

24 points

0 comments7 min readLW link

(mailchi.mp)

[AN #108]: Why we should scrutinize arguments for AI risk

Rohin Shah16 Jul 2020 6:47 UTC

19 points

6 comments12 min readLW link

(mailchi.mp)

[AN #89]: A unifying formalism for preference learning algorithms

Rohin Shah4 Mar 2020 18:20 UTC

16 points

0 comments9 min readLW link

(mailchi.mp)

[AN #82]: How OpenAI Five distributed their training computation

Rohin Shah15 Jan 2020 18:20 UTC

19 points

0 comments8 min readLW link

(mailchi.mp)

AI Safety Newsletter #2: ChaosGPT, Natural Selection, and AI Safety in the Media

ozhang, Dan H and Orpheus16

18 Apr 2023 18:44 UTC

30 points

0 comments4 min readLW link

(newsletter.safe.ai)

[AN #127]: Rethinking agency: Cartesian frames as a formalization of ways to carve up the world into an agent and its environment

Rohin Shah2 Dec 2020 18:20 UTC

53 points

0 comments13 min readLW link

(mailchi.mp)

[AN #107]: The convergent instrumental subgoals of goal-directed agents

Rohin Shah16 Jul 2020 6:47 UTC

13 points

1 comment8 min readLW link

(mailchi.mp)

Progress Studies Fellowship looking for members

jay ram6 Jul 2023 17:41 UTC

3 points

0 comments1 min readLW link

Alignment Newsletter #53

Rohin Shah18 Apr 2019 17:20 UTC

20 points

0 comments8 min readLW link

(mailchi.mp)

May gwern.net newsletter

gwern1 Jun 2019 17:25 UTC

17 points

0 comments1 min readLW link

(www.gwern.net)

EA & LW Forums Weekly Summary (28 Aug − 3 Sep 22’)

Zoe Williams6 Sep 2022 11:06 UTC

51 points

2 comments14 min readLW link

The Alignment Newsletter #10: 06/11/18

Rohin Shah11 Jun 2018 16:00 UTC

16 points

0 comments9 min readLW link

[AN #110]: Learning features from human feedback to enable reward learning

Rohin Shah29 Jul 2020 17:20 UTC

13 points

2 comments10 min readLW link

(mailchi.mp)

NeurIPS Safety & ChatGPT. MLAISU W48

Esben Kran and Steinthal

2 Dec 2022 15:50 UTC

3 points

0 comments4 min readLW link

(newsletter.apartresearch.com)

[AN #68]: The attainable utility theory of impact

Rohin Shah14 Oct 2019 17:00 UTC

17 points

0 comments8 min readLW link

(mailchi.mp)

September 2020 gwern.net newsletter

gwern26 Oct 2020 13:38 UTC

17 points

1 comment1 min readLW link

(www.gwern.net)

Alignment Newsletter #28

Rohin Shah15 Oct 2018 21:20 UTC

11 points

0 comments8 min readLW link

(mailchi.mp)

November 2018 gwern.net newsletter

gwern1 Dec 2018 13:57 UTC

35 points

0 comments1 min readLW link

(www.gwern.net)

Regulate or Compete? The China Factor in U.S. AI Policy (NAIR #2)

charles_m5 May 2023 17:43 UTC

2 points

1 comment7 min readLW link

(navigatingairisks.substack.com)

[AN #92]: Learning good representations with contrastive predictive coding

Rohin Shah25 Mar 2020 17:20 UTC

18 points

1 comment10 min readLW link

(mailchi.mp)

Alignment Newsletter #26

Rohin Shah2 Oct 2018 16:10 UTC

13 points

0 comments7 min readLW link

(mailchi.mp)

Alignment Newsletter #43

Rohin Shah29 Jan 2019 21:10 UTC

14 points

2 comments13 min readLW link

(mailchi.mp)

May Gwern.net newsletter (w/GPT-3 commentary)

gwern2 Jun 2020 15:40 UTC

32 points

7 comments1 min readLW link

(www.gwern.net)

Alignment Newsletter #33

Rohin Shah19 Nov 2018 17:20 UTC

23 points

0 comments9 min readLW link

(mailchi.mp)

[AN #65]: Learning useful skills by watching humans “play”

Rohin Shah23 Sep 2019 17:30 UTC

11 points

0 comments9 min readLW link

(mailchi.mp)

July gwern.net newsletter

gwern2 Aug 2018 13:42 UTC

24 points

0 comments1 min readLW link

(www.gwern.net)

Alignment Newsletter #40

Rohin Shah8 Jan 2019 20:10 UTC

21 points

2 comments5 min readLW link

(mailchi.mp)

[AN #95]: A framework for thinking about how to make AI go well

Rohin Shah15 Apr 2020 17:10 UTC

20 points

2 comments10 min readLW link

(mailchi.mp)

Alignment Newsletter #18

Rohin Shah6 Aug 2018 16:00 UTC

17 points

0 comments10 min readLW link

(mailchi.mp)

Alignment Newsletter #50

Rohin Shah28 Mar 2019 18:10 UTC

15 points

2 comments10 min readLW link

(mailchi.mp)

AISN #56: Google Releases Veo 3

Corin Katzke and Dan H

28 May 2025 16:00 UTC

7 points

0 comments4 min readLW link

(newsletter.safe.ai)

[AN #80]: Why AI risk might be solved without additional intervention from longtermists

Rohin Shah2 Jan 2020 18:20 UTC

36 points

95 comments10 min readLW link

(mailchi.mp)

Forecasting Newsletter: July 2020.

NunoSempere1 Aug 2020 17:08 UTC

21 points

4 comments22 min readLW link

[AN #75]: Solving Atari and Go with learned game models, and thoughts from a MIRI employee

Rohin Shah27 Nov 2019 18:10 UTC

38 points

1 comment10 min readLW link

(mailchi.mp)

AISN #36: Voluntary Commitments are Insufficient Plus, a Senate AI Policy Roadmap, and Chapter 1: An Overview of Catastrophic Risks

Corin Katzke, Julius and Dan H

5 Jun 2024 17:45 UTC

9 points

0 comments5 min readLW link

(newsletter.safe.ai)

September 2019 gwern.net newsletter

gwern4 Oct 2019 16:44 UTC

21 points

0 comments1 min readLW link

(www.gwern.net)

Alignment Newsletter #16: 07/23/18

Rohin Shah23 Jul 2018 16:20 UTC

42 points

0 comments12 min readLW link

(mailchi.mp)

[AN #106]: Evaluating generalization ability of learned reward models

Rohin Shah1 Jul 2020 17:20 UTC

14 points

2 comments11 min readLW link

(mailchi.mp)

[AN #111]: The Circuits hypotheses for deep learning

Rohin Shah5 Aug 2020 17:40 UTC

23 points

0 comments9 min readLW link

(mailchi.mp)

The Alignment Newsletter #6: 05/14/18

Rohin Shah14 May 2018 16:00 UTC

8 points

0 comments2 min readLW link

Alignment Newsletter #39

Rohin Shah1 Jan 2019 8:10 UTC

32 points

2 comments5 min readLW link

(mailchi.mp)

August 2020 gwern.net newsletter

gwern1 Sep 2020 21:04 UTC

25 points

4 comments1 min readLW link

(www.gwern.net)

Manifund: What we’re funding (weeks 2-4)

Austin Chen4 Aug 2023 16:00 UTC

44 points

2 comments5 min readLW link

(manifund.substack.com)

Alignment Newsletter One Year Retrospective

Rohin Shah10 Apr 2019 6:58 UTC

94 points

31 comments21 min readLW link

MIRI’s June 2024 Newsletter

Harlan14 Jun 2024 23:02 UTC

74 points

20 comments2 min readLW link

(intelligence.org)

[AN #100]: What might go wrong if you learn a reward function while acting

Rohin Shah20 May 2020 17:30 UTC

33 points

2 comments12 min readLW link

(mailchi.mp)

AISN #46: The Transition

Corin Katzke and Dan H

23 Jan 2025 18:09 UTC

8 points

0 comments5 min readLW link

(newsletter.safe.ai)

[AN #56] Should ML researchers stop running experiments before making hypotheses?

Rohin Shah21 May 2019 2:20 UTC

21 points

8 comments9 min readLW link

(mailchi.mp)

[AN #55] Regulatory markets and international standards as a means of ensuring beneficial AI

Rohin Shah5 May 2019 2:20 UTC

17 points

2 comments8 min readLW link

(mailchi.mp)

[AN #79]: Recursive reward modeling as an alignment technique integrated with deep RL

Rohin Shah1 Jan 2020 18:00 UTC

13 points

0 comments12 min readLW link

(mailchi.mp)

AI Safety Newsletter #40: California AI Legislation Plus, NVIDIA Delays Chip Production, and Do AI Safety Benchmarks Actually Measure Safety?

Corin Katzke, Julius, Alexa Pan and Dan H

21 Aug 2024 18:09 UTC

11 points

0 comments6 min readLW link

(newsletter.safe.ai)

Forecasting Newsletter: August 2020.

NunoSempere1 Sep 2020 11:38 UTC

16 points

1 comment6 min readLW link

Generalizability & Hope for AI [MLAISU W03]

Esben Kran20 Jan 2023 10:06 UTC

5 points

2 comments2 min readLW link

(newsletter.apartresearch.com)

[AN #123]: Inferring what is valuable in order to align recommender systems

Rohin Shah28 Oct 2020 17:00 UTC

20 points

1 comment8 min readLW link

(mailchi.mp)

Recent updates to gwern.net (2015-2016)

gwern26 Aug 2016 19:22 UTC

42 points

6 comments1 min readLW link

MIRI’s April 2024 Newsletter

Harlan12 Apr 2024 23:38 UTC

95 points

0 comments3 min readLW link

(intelligence.org)

Alignment Newsletter #21

Rohin Shah27 Aug 2018 16:20 UTC

25 points

0 comments7 min readLW link

(mailchi.mp)

OpenAI: Facts from a Weekend

Zvi20 Nov 2023 15:30 UTC

272 points

166 comments9 min readLW link

(thezvi.wordpress.com)

Alignment Newsletter #37

Rohin Shah17 Dec 2018 19:10 UTC

25 points

4 comments10 min readLW link

(mailchi.mp)

The Alignment Newsletter #2: 04/16/18

Rohin Shah16 Apr 2018 16:00 UTC

8 points

0 comments5 min readLW link

[AN #136]: How well will GPT-N perform on downstream tasks?

Rohin Shah3 Feb 2021 18:10 UTC

21 points

2 comments9 min readLW link

(mailchi.mp)

June 2020 gwern.net newsletter

gwern2 Jul 2020 14:19 UTC

16 points

0 comments1 min readLW link

(www.gwern.net)

[AN #62] Are adversarial examples caused by real but imperceptible features?

Rohin Shah22 Aug 2019 17:10 UTC

28 points

10 comments9 min readLW link

(mailchi.mp)

February 2020 gwern.net newsletter

gwern4 Mar 2020 19:05 UTC

15 points

0 comments1 min readLW link

(www.gwern.net)

[AN #60] A new AI challenge: Minecraft agents that assist human players in creative mode

Rohin Shah22 Jul 2019 17:00 UTC

23 points

6 comments9 min readLW link

(mailchi.mp)

MIRI’s July 2024 newsletter

Harlan15 Jul 2024 21:28 UTC

25 points

2 comments1 min readLW link

(intelligence.org)

Alignment Newsletter #27

Rohin Shah9 Oct 2018 1:10 UTC

16 points

0 comments9 min readLW link

(mailchi.mp)

Alignment Newsletter #45

Rohin Shah14 Feb 2019 2:10 UTC

25 points

2 comments8 min readLW link

(mailchi.mp)

[AN #87]: What might happen as deep learning scales even further?

Rohin Shah19 Feb 2020 18:20 UTC

28 points

0 comments4 min readLW link

(mailchi.mp)

The Alignment Newsletter #11: 06/18/18

Rohin Shah18 Jun 2018 16:00 UTC

8 points

0 comments10 min readLW link

[AN #90]: How search landscapes can contain self-reinforcing feedback loops

Rohin Shah11 Mar 2020 17:30 UTC

11 points

6 comments8 min readLW link

(mailchi.mp)

Alignment Newsletter #14

Rohin Shah9 Jul 2018 16:20 UTC

14 points

0 comments9 min readLW link

(mailchi.mp)

AI Safety Newsletter #39: Implications of a Trump Administration for AI Policy Plus, Safety Engineering

Corin Katzke, Alexa Pan, Julius and Dan H

29 Jul 2024 17:50 UTC

17 points

1 comment6 min readLW link

(newsletter.safe.ai)

June gwern.net newsletter

gwern4 Jul 2018 22:59 UTC

34 points

0 comments1 min readLW link

(www.gwern.net)

October gwern.net links

gwern1 Nov 2018 1:11 UTC

29 points

8 comments1 min readLW link

(www.gwern.net)

The Alignment Newsletter #9: 06/04/18

Rohin Shah4 Jun 2018 16:00 UTC

8 points

0 comments2 min readLW link

Alignment Newsletter #46

Rohin Shah22 Feb 2019 0:10 UTC

12 points

0 comments9 min readLW link

(mailchi.mp)

Alignment Newsletter #22

Rohin Shah3 Sep 2018 16:10 UTC

18 points

0 comments6 min readLW link

(mailchi.mp)

[AN #70]: Agents that help humans who are still learning about their own preferences

Rohin Shah23 Oct 2019 17:10 UTC

16 points

0 comments9 min readLW link

(mailchi.mp)

[AN #72]: Alignment, robustness, methodology, and system building as research priorities for AI safety

Rohin Shah6 Nov 2019 18:10 UTC

26 points

4 comments10 min readLW link

(mailchi.mp)

[AN #103]: ARCHES: an agenda for existential safety, and combining natural language with deep RL

Rohin Shah10 Jun 2020 17:20 UTC

29 points

0 comments10 min readLW link

(mailchi.mp)

The Alignment Newsletter #8: 05/28/18

Rohin Shah28 May 2018 16:00 UTC

8 points

0 comments6 min readLW link

March 2020 gwern.net newsletter

gwern3 Apr 2020 2:16 UTC

13 points

1 comment1 min readLW link

(www.gwern.net)

Alignment Newsletter #25

Rohin Shah24 Sep 2018 16:10 UTC

18 points

3 comments9 min readLW link

(mailchi.mp)

[AN #73]: Detecting catastrophic failures by learning how agents tend to break

Rohin Shah13 Nov 2019 18:10 UTC

11 points

0 comments7 min readLW link

(mailchi.mp)

The Alignment Newsletter #7: 05/21/18

Rohin Shah21 May 2018 16:00 UTC

8 points

0 comments5 min readLW link

April 2020 gwern.net newsletter

gwern1 May 2020 20:47 UTC

11 points

0 comments1 min readLW link

(www.gwern.net)

Alignment Newsletter #15: 07/16/18

Rohin Shah16 Jul 2018 16:10 UTC

42 points

0 comments15 min readLW link

(mailchi.mp)

AISN #61: OpenAI Releases GPT-5

Corin Katzke and Dan H

12 Aug 2025 18:02 UTC

5 points

0 comments4 min readLW link

(newsletter.safe.ai)

January 2020 gwern.net newsletter

gwern31 Jan 2020 18:04 UTC

19 points

0 comments1 min readLW link

(www.gwern.net)

[AN #88]: How the principal-agent literature relates to AI risk

Rohin Shah27 Feb 2020 9:10 UTC

18 points

0 comments9 min readLW link

(mailchi.mp)

[AN #54] Boxing a finite-horizon AI system to keep it unambitious

Rohin Shah28 Apr 2019 5:20 UTC

20 points

0 comments8 min readLW link

(mailchi.mp)

AI Impacts Quarterly Newsletter, Apr-Jun 2023

Harlan and Richard Korzekwa

18 Jul 2023 17:14 UTC

6 points

0 comments3 min readLW link

(blog.aiimpacts.org)

Alignment Newsletter #31

Rohin Shah5 Nov 2018 23:50 UTC

17 points

0 comments12 min readLW link

(mailchi.mp)

Null-boxing Newcomb’s Problem

Yitz13 Jul 2020 16:32 UTC

33 points

9 comments4 min readLW link

Russian x-risks newsletter, summer 2019

avturchin7 Sep 2019 9:50 UTC

39 points

5 comments4 min readLW link

[AN #76]: How dataset size affects robustness, and benchmarking safe exploration by measuring constraint violations

Rohin Shah4 Dec 2019 18:10 UTC

14 points

6 comments9 min readLW link

(mailchi.mp)

AISN #44: The Trump Circle on AI Safety Plus, Chinese researchers used Llama to create a military tool for the PLA, a Google AI system discovered a zero-day cybersecurity vulnerability, and Complex Systems

Corin Katzke, Julius, andrewz and Dan H

19 Nov 2024 16:36 UTC

9 points

0 comments5 min readLW link

(newsletter.safe.ai)

Russian x-risks newsletter #2, fall 2019

avturchin3 Dec 2019 16:54 UTC

22 points

0 comments3 min readLW link

Russian x-risks newsletter Summer 2020

avturchin1 Sep 2020 14:06 UTC

22 points

6 comments1 min readLW link

December gwern.net newsletter

gwern2 Jan 2019 15:13 UTC

20 points

0 comments1 min readLW link

(www.gwern.net)

[AN #104]: The perils of inaccessible information, and what we can learn about AI alignment from COVID

Rohin Shah18 Jun 2020 17:10 UTC

19 points

5 comments8 min readLW link

(mailchi.mp)

AISN #19: US-China Competition on AI Chips, Measuring Language Agent Developments, Economic Analysis of Language Model Propaganda, and White House AI Cyber Challenge

Dan H15 Aug 2023 16:10 UTC

21 points

0 comments5 min readLW link

(newsletter.safe.ai)

Alignment Newsletter #48

Rohin Shah11 Mar 2019 21:10 UTC

29 points

14 comments9 min readLW link

(mailchi.mp)

[AN #74]: Separating beneficial AI into competence, alignment, and coping with impacts

Rohin Shah20 Nov 2019 18:20 UTC

19 points

0 comments7 min readLW link

(mailchi.mp)

Alignment Newsletter #44

Rohin Shah6 Feb 2019 8:30 UTC

18 points

0 comments9 min readLW link

(mailchi.mp)

[AN #114]: Theory-inspired safety solutions for powerful Bayesian RL agents

Rohin Shah26 Aug 2020 17:20 UTC

21 points

3 comments8 min readLW link

(mailchi.mp)

[AN #91]: Concepts, implementations, problems, and a benchmark for impact measurement

Rohin Shah18 Mar 2020 17:10 UTC

15 points

10 comments13 min readLW link

(mailchi.mp)

No comments.