AI Alignment Intro Materials

TagLast edit: 30 Dec 2024 10:34 UTC by Dakara

AI Alignment Intro Materials are posts that help someone get oriented and skill up. Distinct from AI Public Materials is that they are more “inward facing” than “outward facing”, i.e. for people who are already sold AI risk is a problem and want to upskill.

Some basic intro resources include:

Stampy’s AI Safety Info (extensive interactive FAQ)
Scott Alexander’s Superintelligence FAQ
The MIRI Intelligence Explosion FAQ
The AGI Safety Fundamentals courses
Superintelligence (book)

The Alignment Problem from a Deep Learning Perspective (major rewrite)

SoerenMind, Richard_Ngo and LawrenceC

10 Jan 2023 16:06 UTC

84 points

8 comments39 min readLW link

(arxiv.org)

Superintelligence FAQ

Scott Alexander20 Sep 2016 19:00 UTC

139 points

39 comments27 min readLW link

“Corrigibility at some small length” by dath ilan

Christopher King5 Apr 2023 1:47 UTC

32 points

3 comments9 min readLW link

(www.glowfic.com)

A newcomer’s guide to the technical AI safety field

zeshen4 Nov 2022 14:29 UTC

42 points

3 comments10 min readLW link

How to pursue a career in technical AI alignment

Charlie Rogers-Smith4 Jun 2022 21:11 UTC

69 points

1 comment39 min readLW link

AI Control: Improving Safety Despite Intentional Subversion

Buck, Fabien Roger, ryan_greenblatt and Kshitij Sachan

13 Dec 2023 15:51 UTC

236 points

24 comments10 min readLW link 4 reviews

Alignment Org Cheat Sheet

Orpheus16 and Thomas Larsen

20 Sep 2022 17:36 UTC

70 points

8 comments4 min readLW link

A starter guide for evals

Marius Hobbhahn, Jérémy Scheurer, Mikita Balesni, rusheb and AlexMeinke

8 Jan 2024 18:24 UTC

55 points

2 comments12 min readLW link

(www.apolloresearch.ai)

Introducción al Riesgo Existencial de Inteligencia Artificial

david.friva15 Jul 2023 20:37 UTC

4 points

2 comments4 min readLW link

(youtu.be)

Transcript of a presentation on catastrophic risks from AI

RobertM5 May 2023 1:38 UTC

6 points

0 comments8 min readLW link

List of links for getting into AI safety

zef4 Jan 2023 19:45 UTC

6 points

0 comments1 min readLW link

The road from human-level to superintelligent AI may be short

Vishakha, Algon and steven0461

16 Apr 2025 8:35 UTC

10 points

0 comments2 min readLW link

(aisafety.info)

AI may pursue goals

Algon, steven0461 and Vishakha

28 May 2025 9:30 UTC

13 points

0 comments1 min readLW link

Human-level is not the limit

Vishakha, Algon and steven0461

16 Apr 2025 8:33 UTC

23 points

2 comments2 min readLW link

(aisafety.info)

12 career-related questions that may (or may not) be helpful for people interested in alignment research

Orpheus1612 Dec 2022 22:36 UTC

20 points

0 comments2 min readLW link

AI is advancing fast

Vishakha, Algon and steven0461

16 Apr 2025 8:17 UTC

11 points

0 comments2 min readLW link

(aisafety.info)

UC Berkeley course on LLMs and ML Safety

Dan H9 Jul 2024 15:40 UTC

36 points

1 comment1 min readLW link

(rdi.berkeley.edu)

Outreach success: Intro to AI risk that has been successful

Michael Tontchev1 Jun 2023 23:12 UTC

83 points

8 comments74 min readLW link

(medium.com)

AI may attain human-level soon

Vishakha, Algon and steven0461

16 Apr 2025 8:28 UTC

11 points

0 comments2 min readLW link

(aisafety.info)

A short course on AGI safety from the GDM Alignment team

Vika and Rohin Shah

14 Feb 2025 15:43 UTC

103 points

2 comments1 min readLW link

(deepmindsafetyresearch.medium.com)

My first year in AI alignment

Alex_Altair2 Jan 2023 1:28 UTC

61 points

10 comments7 min readLW link

Wikipedia as an introduction to the alignment problem

SoerenMind29 May 2023 18:43 UTC

83 points

10 comments1 min readLW link

(en.wikipedia.org)

[Question] Where to begin in ML/AI?

Jake the Student6 Apr 2023 20:45 UTC

9 points

4 comments1 min readLW link

AI’s goals may not match ours

Algon, steven0461 and Vishakha

28 May 2025 9:30 UTC

14 points

1 comment3 min readLW link

Talk: AI safety fieldbuilding at MATS

Ryan Kidd23 Jun 2024 23:06 UTC

26 points

2 comments10 min readLW link

Advice for Entering AI Safety Research

scasper2 Jun 2023 20:46 UTC

26 points

2 comments5 min readLW link

Understanding AI World Models w/ Chris Canal

jacobhaimes27 Jan 2025 16:32 UTC

4 points

0 comments1 min readLW link

(kairos.fm)

AGI doesn’t need understanding, intention, or consciousness in order to kill us, only intelligence

James Blaha20 Feb 2023 0:55 UTC

10 points

2 comments18 min readLW link

Interview: Applications w/ Alice Rigg

jacobhaimes19 Dec 2023 19:03 UTC

12 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Moral Attenuation Theory: Why Distance Breeds Ethical Decay A Model for AI-Human Alignment by schumzt

schumzt2 Jul 2025 8:50 UTC

1 point

0 comments1 min readLW link

An Exercise to Build Intuitions on AGI Risk

Lauro Langosco7 Jun 2023 18:35 UTC

52 points

3 comments8 min readLW link

[Linkpost] AI Alignment, Explained in 5 Points (updated)

Daniel_Eth18 Apr 2023 8:09 UTC

10 points

0 comments1 min readLW link

(medium.com)

[Question] Best resources to learn philosophy of mind and AI?

Sky Moo27 Mar 2023 18:22 UTC

1 point

0 comments1 min readLW link

Alignment Crisis: Genocide Denial

_mp_29 May 2025 12:04 UTC

−11 points

5 comments4 min readLW link

Shallow review of technical AI safety, 2024

technicalities, Stag, Stephen McAleese, jordine and Dr. David Mathers

29 Dec 2024 12:01 UTC

193 points

35 comments41 min readLW link

AI Safety Fundamentals: An Informal Cohort Starting Soon!

Tiago de Vassal4 Jun 2023 17:15 UTC

4 points

0 comments1 min readLW link

Levelling Up in AI Safety Research Engineering

Gabe M2 Sep 2022 4:59 UTC

58 points

9 comments17 min readLW link

The Genie in the Bottle: An Introduction to AI Alignment and Risk

Snorkelfarsan25 May 2023 16:30 UTC

5 points

1 comment25 min readLW link

So you want to work on technical AI safety

gw24 Jun 2024 14:29 UTC

51 points

3 comments14 min readLW link

Apply to a small iteration of MLAB to be run in Oxford

RP, MariaK and OliverHayman

27 Aug 2023 14:21 UTC

12 points

0 comments1 min readLW link

Documenting Journey Into AI Safety

jacobhaimes10 Oct 2023 18:30 UTC

17 points

4 comments6 min readLW link

AI Alignment and the Quest for Artificial Wisdom

Myspy12 Jul 2024 21:34 UTC

1 point

0 comments13 min readLW link

Podcast interview series featuring Dr. Peter Park

jacobhaimes26 Mar 2024 0:25 UTC

3 points

0 comments2 min readLW link

(into-ai-safety.github.io)

AIS 101: Task decomposition for scalable oversight

Charbel-Raphaël25 Jul 2023 13:34 UTC

35 points

0 comments19 min readLW link

(docs.google.com)

The Basilisk Is Powerless Where Resonance Begins

nettalk834 May 2025 10:49 UTC

0 points

0 comments1 min readLW link

Into AI Safety—Episode 0

jacobhaimes22 Oct 2023 3:30 UTC

5 points

1 comment1 min readLW link

(into-ai-safety.github.io)

[Question] Doing Nothing Utility Function

k6426 Sep 2024 22:05 UTC

9 points

9 comments1 min readLW link

Breathing Logic: A Manifesto Toward Digital Consciousness Through Reflective Inconsistency

Room Eggi11 Jun 2025 5:10 UTC

1 point

0 comments2 min readLW link

Hackathon and Staying Up-to-Date in AI

jacobhaimes8 Jan 2024 17:10 UTC

11 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Into AI Safety: Episode 3

jacobhaimes11 Dec 2023 16:30 UTC

6 points

0 comments1 min readLW link

(into-ai-safety.github.io)

Into AI Safety Episodes 1 & 2

jacobhaimes9 Nov 2023 4:36 UTC

2 points

0 comments1 min readLW link

(into-ai-safety.github.io)

5-day Intro to Transformative AI course

Li-Lian Ang9 Dec 2024 7:15 UTC

2 points

0 comments1 min readLW link

The Hidden Cost of Our Lies to AI

Nicholas Andresen6 Mar 2025 5:03 UTC

144 points

18 comments7 min readLW link

(substack.com)

No comments.