ArchiveSequencesAbout
QuestionsEventsShortformAlignment ForumAF Comments
HomeFeaturedAllTagsRecent Comments
RSS
NewHotActiveOld
Page 1

Why we are ex­cited about con­fes­sion!

Boaz Barak, Gabriel Wu and Manas Joglekar
14 Jan 2026 20:37 UTC
119 points
27 comments9 min readLW link
(alignment.openai.com)

In My Misan­thropy Era

jenn4 Jan 2026 18:34 UTC
314 points
142 comments8 min readLW link
(jenn.site)

2025 in AI predictions

jessicata2 Jan 2026 4:29 UTC
233 points
19 comments11 min readLW link

Turn­ing 20 in the prob­a­ble pre-apoc­a­lypse

Parv Mahajan21 Dec 2025 10:14 UTC
386 points
61 comments3 min readLW link

Eliezer’s Un­teach­able Meth­ods of Sanity

Eliezer Yudkowsky7 Dec 2025 2:46 UTC
479 points
147 comments10 min readLW link

Mea­sur­ing no CoT math time hori­zon (sin­gle for­ward pass)

ryan_greenblatt26 Dec 2025 16:37 UTC
212 points
18 comments3 min readLW link

Opinionated Takes on Mee­tups Organizing

jenn20 Dec 2025 0:17 UTC
246 points
34 comments9 min readLW link

6 rea­sons why “al­ign­ment-is-hard” dis­course seems alien to hu­man in­tu­itions, and vice-versa

Steven Byrnes3 Dec 2025 18:37 UTC
352 points
87 comments17 min readLW link

Align­ment re­mains a hard, un­solved problem

evhub27 Nov 2025 8:45 UTC
361 points
96 comments14 min readLW link

How I stopped be­ing sure LLMs are just mak­ing up their in­ter­nal ex­pe­rience (but the topic is still con­fus­ing)

Kaj_Sotala13 Dec 2025 12:38 UTC
198 points
66 comments29 min readLW link

The Com­pany Man

Tomás B.17 Sep 2025 17:47 UTC
784 points
79 comments18 min readLW link

In­sights into Claude Opus 4.5 from Pokémon

Julian Bradshaw9 Dec 2025 16:57 UTC
204 points
24 comments10 min readLW link

Para­noia: A Begin­ner’s Guide

habryka13 Nov 2025 7:56 UTC
343 points
70 comments13 min readLW link

The Rise of Par­a­sitic AI

Adele Lopez11 Sep 2025 4:38 UTC
716 points
179 comments20 min readLW link

Leg­ible vs. Illeg­ible AI Safety Problems

Wei Dai4 Nov 2025 21:39 UTC
365 points
93 comments2 min readLW link

Why peo­ple like your quick bul­lshit takes bet­ter than your high-effort posts

eukaryote28 Nov 2025 20:12 UTC
227 points
27 comments5 min readLW link
(eukaryotewritesblog.com)

Nat­u­ral emer­gent mis­al­ign­ment from re­ward hack­ing in pro­duc­tion RL

evhub, Monte M, Benjamin Wright and Jonathan Uesato
21 Nov 2025 20:00 UTC
261 points
32 comments9 min readLW link

The be­hav­ioral se­lec­tion model for pre­dict­ing AI motivations

Alex Mallen and Buck
4 Dec 2025 18:46 UTC
181 points
27 comments16 min readLW link

Good if make prior af­ter data in­stead of before

dynomight18 Dec 2025 17:53 UTC
114 points
15 comments9 min readLW link
(dynomight.net)

How Colds Spread

RobertM18 Nov 2025 5:25 UTC
239 points
31 comments10 min readLW link
Back to topNext