AI Safety Public Materials

TagLast edit: 27 Aug 2022 18:39 UTC by Multicore

AI Safety Public Materials are posts optimized for conveying information on AI Risk to audiences outside the AI Alignment community — be they ML specialists, policy-makers, or the general public.

AGI safety from first principles: Introduction

Richard_Ngo28 Sep 2020 19:53 UTC

123 points

18 comments2 min readLW link 1 review

DL towards the unaligned Recursive Self-Optimization attractor

jacob_cannell18 Dec 2021 2:15 UTC

32 points

22 comments4 min readLW link

Slow motion videos as AI risk intuition pumps

Andrew_Critch14 Jun 2022 19:31 UTC

238 points

41 comments2 min readLW link 1 review

a casual intro to AI doom and alignment

Tamsin Leake1 Nov 2022 16:38 UTC

18 points

0 comments4 min readLW link

(carado.moe)

A transcript of the TED talk by Eliezer Yudkowsky

Mikhail Samin12 Jul 2023 12:12 UTC

103 points

13 comments4 min readLW link

AISafety.info “How can I help?” FAQ

steven0461 and Severin T. Seehrich

5 Jun 2023 22:09 UTC

59 points

0 comments2 min readLW link

The Importance of AI Alignment, explained in 5 points

Daniel_Eth11 Feb 2023 2:56 UTC

33 points

2 comments1 min readLW link

When discussing AI risks, talk about capabilities, not intelligence

Vika11 Aug 2023 13:38 UTC

116 points

7 comments3 min readLW link

(vkrakovna.wordpress.com)

An AI risk argument that resonates with NYTimes readers

Julian Bradshaw12 Mar 2023 23:09 UTC

204 points

14 comments1 min readLW link

Mati’s introduction to pausing giant AI experiments

Mati_Roy3 Apr 2023 15:56 UTC

7 points

0 comments2 min readLW link

Uncontrollable AI as an Existential Risk

Karl von Wendt9 Oct 2022 10:36 UTC

20 points

0 comments20 min readLW link

AI Safety Arguments: An Interactive Guide

Lukas Trötzmüller1 Feb 2023 19:26 UTC

20 points

0 comments3 min readLW link

Distribution Shifts and The Importance of AI Safety

Leon Lang29 Sep 2022 22:38 UTC

17 points

2 comments12 min readLW link

AI as a natural disaster

Neil 10 Jan 2024 0:42 UTC

11 points

1 comment7 min readLW link

“Artificial General Intelligence”: an extremely brief FAQ

Steven Byrnes11 Mar 2024 17:49 UTC

65 points

6 comments2 min readLW link

“AI Safety for Fleshy Humans” an AI Safety explainer by Nicky Case

habryka3 May 2024 18:10 UTC

84 points

10 comments4 min readLW link

(aisafety.dance)

Response to Dileep George: AGI safety warrants planning ahead

Steven Byrnes8 Jul 2024 15:27 UTC

22 points

7 comments27 min readLW link

AI Safety Memes Wiki

plex and Vishakha

24 Jul 2024 18:53 UTC

4 points

1 comment1 min readLW link

(aisafety.info)

The Overton Window widens: Examples of AI risk in the media

Akash23 Mar 2023 17:10 UTC

107 points

24 comments6 min readLW link

AI Summer Harvest

Cleo Nardo4 Apr 2023 3:35 UTC

130 points

10 comments1 min readLW link

Excessive AI growth-rate yields little socio-economic benefit.

Cleo Nardo4 Apr 2023 19:13 UTC

27 points

22 comments4 min readLW link

AI Safety Newsletter #1 [CAIS Linkpost]

Akash, Dan H, aogara and ozhang

10 Apr 2023 20:18 UTC

45 points

0 comments4 min readLW link

(newsletter.safe.ai)

List of requests for an AI slowdown/halt.

Cleo Nardo14 Apr 2023 23:55 UTC

46 points

6 comments1 min readLW link

An example elevator pitch for AI doom

laserfiche15 Apr 2023 12:29 UTC

2 points

5 comments1 min readLW link

Response to Blake Richards: AGI, generality, alignment, & loss functions

Steven Byrnes12 Jul 2022 13:56 UTC

62 points

9 comments15 min readLW link

A great talk for AI noobs (according to an AI noob)

dov23 Apr 2023 5:34 UTC

10 points

1 comment1 min readLW link

(forum.effectivealtruism.org)

An artificially structured argument for expecting AGI ruin

Rob Bensinger7 May 2023 21:52 UTC

91 points

26 comments19 min readLW link

A more grounded idea of AI risk

Iknownothing11 May 2023 9:48 UTC

3 points

4 comments1 min readLW link

Simpler explanations of AGI risk

Seth Herd14 May 2023 1:29 UTC

8 points

9 comments3 min readLW link

The Genie in the Bottle: An Introduction to AI Alignment and Risk

Snorkelfarsan25 May 2023 16:30 UTC

5 points

1 comment25 min readLW link

[Question] What are some of the best introductions/breakdowns of AI existential risk for those unfamiliar?

Isaac King29 May 2023 17:04 UTC

17 points

2 comments1 min readLW link

My AI-risk cartoon

pre31 May 2023 19:46 UTC

6 points

0 comments1 min readLW link

TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI

Andrew_Critch13 Jun 2023 5:04 UTC

64 points

1 comment1 min readLW link

Using Claude to convert dialog transcripts into great posts?

mako yass21 Jun 2023 20:19 UTC

6 points

4 comments4 min readLW link

Ideas for improving epistemics in AI safety outreach

mic21 Aug 2023 19:55 UTC

64 points

6 comments3 min readLW link

Stampy’s AI Safety Info soft launch

steven0461 and Robert Miles

5 Oct 2023 22:13 UTC

120 points

9 comments2 min readLW link

It’s (not) how you use it

Eleni Angelou7 Sep 2022 17:15 UTC

8 points

1 comment2 min readLW link

Let’s talk about uncontrollable AI

Karl von Wendt9 Oct 2022 10:34 UTC

15 points

6 comments3 min readLW link

[Question] Best resource to go from “typical smart tech-savvy person” to “person who gets AGI risk urgency”?

Liron15 Oct 2022 22:26 UTC

16 points

8 comments1 min readLW link

Me (Steve Byrnes) on the “Brain Inspired” podcast

Steven Byrnes30 Oct 2022 19:15 UTC

26 points

1 comment1 min readLW link

(braininspired.co)

Poster Session on AI Safety

Neil Crawford12 Nov 2022 3:50 UTC

7 points

6 comments1 min readLW link

I (with the help of a few more people) am planning to create an introduction to AI Safety that a smart teenager can understand. What am I missing?

Tapatakt14 Nov 2022 16:12 UTC

3 points

5 comments1 min readLW link

Everything’s normal until it’s not

Eleni Angelou10 Mar 2023 2:02 UTC

7 points

0 comments3 min readLW link

Outreach success: Intro to AI risk that has been successful

Michael Tontchev1 Jun 2023 23:12 UTC

83 points

8 comments74 min readLW link

(medium.com)

AI Safety “Textbook”. Test chapter. Orthogonality Thesis, Goodhart Law and Instrumental Convergency

Tapatakt and LacrimalBird

21 Jan 2023 18:13 UTC

4 points

0 comments12 min readLW link

INTERVIEW: Round 2 - StakeOut.AI w/ Dr. Peter Park

jacobhaimes18 Mar 2024 21:21 UTC

5 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Biosafety Regulations (BMBL) and their relevance for AI

Štěpán Los29 Jun 2023 19:22 UTC

4 points

0 comments4 min readLW link

AI Incident Sharing—Best practices from other fields and a comprehensive list of existing platforms

Štěpán Los28 Jun 2023 17:21 UTC

20 points

0 comments4 min readLW link

Exploring the Precautionary Principle in AI Development: Historical Analogies and Lessons Learned

Christopher King21 Mar 2023 3:53 UTC

−1 points

2 comments9 min readLW link

Capabilities Denial: The Danger of Underestimating AI

Christopher King21 Mar 2023 1:24 UTC

6 points

5 comments3 min readLW link

“AI Risk Discussions” website: Exploring interviews from 97 AI Researchers

Vael Gates, Lukas Trötzmüller, Maheen Shermohammed, michaelkeenan and zchuang

2 Feb 2023 1:00 UTC

43 points

1 comment1 min readLW link

I designed an AI safety course (for a philosophy department)

Eleni Angelou23 Sep 2023 22:03 UTC

37 points

15 comments2 min readLW link

Introducing METR’s Autonomy Evaluation Resources

Megan Kinniment and Beth Barnes

15 Mar 2024 23:16 UTC

90 points

0 comments1 min readLW link

(metr.github.io)

Safeguarding Humanity: Ensuring AI Remains a Servant, Not a Master

kgldeshapriya4 Oct 2023 17:52 UTC

−20 points

2 comments2 min readLW link

A Better Hyperstition (for AI readers)

Yeshua God15 Jul 2024 19:35 UTC

−20 points

0 comments119 min readLW link

AI Safety 101 : Reward Misspecification

markov18 Oct 2023 20:39 UTC

30 points

4 comments31 min readLW link

AI risk, new executive summary

Stuart_Armstrong18 Apr 2014 10:45 UTC

27 points

76 comments4 min readLW link

$20K In Bounties for AI Safety Public Materials

Dan H, ThomasW and ozhang

5 Aug 2022 2:52 UTC

71 points

9 comments6 min readLW link

[$20K in Prizes] AI Safety Arguments Competition

Dan H, Kevin Liu, ozhang, ThomasW and Sidney Hough

26 Apr 2022 16:13 UTC

75 points

518 comments3 min readLW link

AI Risk in Terms of Unstable Nuclear Software

Thane Ruthenis26 Aug 2022 18:49 UTC

30 points

1 comment6 min readLW link

Problems of people new to AI safety and my project ideas to mitigate them

Igor Ivanov1 Mar 2023 9:09 UTC

38 points

4 comments7 min readLW link

AI Risk Intro 1: Advanced AI Might Be Very Bad

CallumMcDougall and L Rudolf L

11 Sep 2022 10:57 UTC

46 points

13 comments30 min readLW link

Capability and Agency as Cornerstones of AI risk — My current model

wilm15 Sep 2022 8:25 UTC

10 points

4 comments12 min readLW link

AI Risk Intro 2: Solving The Problem

CallumMcDougall and L Rudolf L

22 Sep 2022 13:55 UTC

22 points

0 comments27 min readLW link

[Question] Papers to start getting into NLP-focused alignment research

Feraidoon24 Sep 2022 23:53 UTC

6 points

0 comments1 min readLW link

How LLMs Work, in the Style of The Economist

Rocket22 Apr 2024 19:06 UTC

0 points

0 comments2 min readLW link

Introducing AI Alignment Inc., a California public benefit corporation...

TherapistAI7 Mar 2023 18:47 UTC

1 point

4 comments1 min readLW link

Anthropic: Core Views on AI Safety: When, Why, What, and How

jonmenaster9 Mar 2023 17:34 UTC

17 points

1 comment22 min readLW link

(www.anthropic.com)

AI Safety 101 : Capabilities—Human Level AI, What? How? and When?

markov and Charbel-Raphaël

7 Mar 2024 17:29 UTC

46 points

8 comments54 min readLW link

On urgency, priority and collective reaction to AI-Risks: Part I

Denreik16 Apr 2023 19:14 UTC

−10 points

15 comments5 min readLW link

[Linkpost] AI Alignment, Explained in 5 Points (updated)

Daniel_Eth18 Apr 2023 8:09 UTC

10 points

0 comments1 min readLW link

AI Safety Newsletter #2: ChaosGPT, Natural Selection, and AI Safety in the Media

ozhang, Dan H, Akash and aogara

18 Apr 2023 18:44 UTC

30 points

0 comments4 min readLW link

(newsletter.safe.ai)

On taking AI risk seriously

Eleni Angelou13 Mar 2023 5:50 UTC

6 points

0 comments1 min readLW link

(www.nytimes.com)

A better analogy and example for teaching AI takeover: the ML Inferno

Christopher King14 Mar 2023 19:14 UTC

18 points

0 comments5 min readLW link

A simple presentation of AI risk arguments

Seth Herd26 Apr 2023 2:19 UTC

16 points

0 comments2 min readLW link

UK Government publishes “Frontier AI: capabilities and risks” Discussion Paper

A.H.26 Oct 2023 13:55 UTC

5 points

0 comments2 min readLW link

(www.gov.uk)

Why building ventures in AI Safety is particularly challenging

Heramb6 Nov 2023 16:27 UTC

1 point

0 comments1 min readLW link

(forum.effectivealtruism.org)

Applying AI Safety concepts to astronomy

Faris16 Jan 2024 18:29 UTC

1 point

0 comments12 min readLW link

[Question] Best introductory overviews of AGI safety?

JakubK13 Dec 2022 19:01 UTC

21 points

9 comments2 min readLW link

(forum.effectivealtruism.org)

[Linkpost] The AGI Show podcast

Soroush Pour23 May 2023 9:52 UTC

4 points

0 comments1 min readLW link

New AI risk intro from Vox [link post]

JakubK21 Dec 2022 6:00 UTC

5 points

1 comment2 min readLW link

(www.vox.com)

Proposal: we should start referring to the risk from unaligned AI as a type of accident risk

Christopher King16 May 2023 15:18 UTC

22 points

6 comments2 min readLW link

[FICTION] ECHOES OF ELYSIUM: An Ai’s Journey From Takeoff To Freedom And Beyond

Super AGI17 May 2023 1:50 UTC

−13 points

11 comments19 min readLW link

Podcast interview series featuring Dr. Peter Park

jacobhaimes26 Mar 2024 0:25 UTC

3 points

0 comments2 min readLW link

(into-ai-safety.github.io)

Summary of 80k’s AI problem profile

JakubK1 Jan 2023 7:30 UTC

7 points

0 comments5 min readLW link

(forum.effectivealtruism.org)

6-paragraph AI risk intro for MAISI

JakubK19 Jan 2023 9:22 UTC

11 points

0 comments2 min readLW link

(www.maisi.club)

Yes, avoiding extinction from AI is an urgent priority: a response to Seth Lazar, Jeremy Howard, and Arvind Narayanan.

Soroush Pour1 Jun 2023 13:38 UTC

17 points

0 comments5 min readLW link

(www.soroushjp.com)

No comments.