Claude Son­net 4.5: Sys­tem Card and Alignment

Zvi30 Sep 2025 20:50 UTC
72 points
4 comments27 min readLW link
(thezvi.wordpress.com)

Halfhaven vir­tual blog­ger camp

Viliam30 Sep 2025 20:22 UTC
87 points
6 comments2 min readLW link

Masks: On the benefits and draw­backs of a so­ciety where ev­ery­one cov­er­ing their face is the norm

3Nora30 Sep 2025 18:43 UTC
−3 points
1 comment3 min readLW link

How reimag­in­ing the na­ture of con­scious­ness en­tirely changes the AI game

Jáchym Fibír30 Sep 2025 18:30 UTC
−9 points
0 comments14 min readLW link
(www.phiand.ai)

The Ba­sic Case For Doom

Bentham's Bulldog30 Sep 2025 16:04 UTC
26 points
4 comments5 min readLW link

AI Safety Re­search Futarchy: Us­ing Pre­dic­tion Mar­kets to Choose Re­search Pro­jects for MARS

JasonBrown30 Sep 2025 15:37 UTC
32 points
8 comments4 min readLW link

ARENA 7.0 - Call for Applicants

30 Sep 2025 14:54 UTC
22 points
0 comments6 min readLW link

The fa­mous sur­vivor­ship bias image is a “loose re­con­struc­tion” of meth­ods used on a hy­po­thet­i­cal dataset

Lao Mein30 Sep 2025 13:13 UTC
47 points
0 comments1 min readLW link

[GDP­val] Models Could Au­to­mate the U.S. Econ­omy by 2027

bira30 Sep 2025 11:53 UTC
14 points
0 comments1 min readLW link

Eth­i­cal De­sign Patterns

AnnaSalamon30 Sep 2025 11:52 UTC
210 points
39 comments20 min readLW link

What is the Base Model Si­mu­la­tion of Hu­man AI-As­sis­tant Con­ver­sa­tion?:

bodry30 Sep 2025 7:08 UTC
5 points
0 comments21 min readLW link

First­post: First impressions

Shell30 Sep 2025 2:23 UTC
14 points
1 comment1 min readLW link

Ex­plo­ra­tion of Coun­ter­fac­tual Im­por­tance and At­ten­tion Heads

Realmbird30 Sep 2025 1:17 UTC
12 points
0 comments6 min readLW link

Why Cor­rigi­bil­ity is Hard and Im­por­tant (i.e. “Whence the high MIRI con­fi­dence in al­ign­ment difficulty?”)

30 Sep 2025 0:12 UTC
80 points
52 comments17 min readLW link

What SB 53, Cal­ifor­nia’s new AI law, does

tlevin29 Sep 2025 23:29 UTC
104 points
12 comments4 min readLW link

Why Most Efforts Towards “Demo­cratic AI” Fall Short

jacobhaimes29 Sep 2025 20:52 UTC
2 points
0 comments6 min readLW link
(www.odysseaninstitute.org)

You’re prob­a­bly over­es­ti­mat­ing how well you un­der­stand Dun­ning-Kruger

abstractapplic29 Sep 2025 19:27 UTC
216 points
24 comments4 min readLW link

On Dwarkesh Pa­tel’s Pod­cast With Richard Sutton

Zvi29 Sep 2025 19:20 UTC
54 points
10 comments23 min readLW link
(thezvi.wordpress.com)

Con­trol­ling the op­tions AIs can pursue

Joe Carlsmith29 Sep 2025 17:23 UTC
15 points
0 comments35 min readLW link

Ex­po­nen­tial in­crease is the de­fault (as­sum­ing it in­creases at all) [Linkpost]

Noosphere8929 Sep 2025 16:13 UTC
13 points
0 comments2 min readLW link
(x.com)

[Question] How does the cur­rent AI paradigm give rise to the “su­per­a­gency” that IABIED is con­cerned with?

jchan29 Sep 2025 15:23 UTC
3 points
4 comments1 min readLW link

AI com­pa­nies’ policy ad­vo­cacy (Sep 2025)

Zach Stein-Perlman29 Sep 2025 15:00 UTC
43 points
0 comments3 min readLW link

KYC for ChatGPT? Prevent­ing AI Harms for Youth Should Not Mean Vio­lat­ing Every­one Else’s Pri­vacy Rights

Noah Weinberger29 Sep 2025 14:18 UTC
7 points
0 comments7 min readLW link

Sys­tem Level Safety Evaluations

29 Sep 2025 13:57 UTC
14 points
0 comments9 min readLW link
(equilibria1.substack.com)

I have de­cided to stop ly­ing to Amer­i­cans about 9/​11

Lao Mein29 Sep 2025 13:55 UTC
86 points
24 comments1 min readLW link

[Re­tracted] Guess I Was Wrong About AIxBio Risks

J Bostock29 Sep 2025 11:44 UTC
62 points
7 comments5 min readLW link

If Drexler Is Wrong, He May as Well Be Right

Tomás B.29 Sep 2025 7:00 UTC
51 points
8 comments2 min readLW link

Ap­plied Mur­phyjitsu Meditation

Alice Blair29 Sep 2025 6:31 UTC
20 points
0 comments3 min readLW link

The per­sonal in­tel­li­gence I want

Rebecca Dai29 Sep 2025 4:09 UTC
20 points
9 comments8 min readLW link
(rebeccadai.substack.com)

Why ASI Align­ment Is Hard (an overview)

Yotam29 Sep 2025 4:05 UTC
16 points
1 comment25 min readLW link

When the AI Dam Breaks: From Surveillance to Game The­ory in AI Alignment

pataphor29 Sep 2025 4:01 UTC
5 points
7 comments5 min readLW link

Yet Another IABIED Review

PeterMcCluskey28 Sep 2025 21:36 UTC
15 points
0 comments7 min readLW link
(bayesianinvestor.com)

A non-re­view of “If Any­one Builds It, Every­one Dies”

boazbarak28 Sep 2025 17:34 UTC
125 points
50 comments4 min readLW link

Trans­gen­der Sticker Fallacy

ymeskhout28 Sep 2025 16:54 UTC
110 points
25 comments7 min readLW link
(www.ymeskhout.com)

Solv­ing the prob­lem of need­ing to give a talk

Kaj_Sotala28 Sep 2025 15:34 UTC
60 points
3 comments8 min readLW link

Les­sons from or­ga­niz­ing a tech­ni­cal AI safety bootcamp

28 Sep 2025 13:48 UTC
16 points
3 comments16 min readLW link

The Risk of Hu­man Disconnection

Priyanka Bharadwaj28 Sep 2025 2:14 UTC
5 points
0 comments3 min readLW link

A Re­ply to MacAskill on “If Any­one Builds It, Every­one Dies”

Rob Bensinger27 Sep 2025 23:03 UTC
55 points
21 comments17 min readLW link

The Sen­si­ble Way For­ward for AI Alignment

Davey Morse27 Sep 2025 21:00 UTC
−9 points
0 comments3 min readLW link

Book Re­view: The System

Julius27 Sep 2025 20:49 UTC
14 points
2 comments16 min readLW link
(thegreymatter.substack.com)

Learn­ings from AI safety course so far

boazbarak27 Sep 2025 18:17 UTC
103 points
5 comments3 min readLW link

My Weirdest Ex­pe­rience Wasn’t

Bridgett Kay27 Sep 2025 18:01 UTC
24 points
3 comments3 min readLW link
(dxmrevealed.wordpress.com)

Mak­ing sense of pa­ram­e­ter-space decomposition

Malmesbury27 Sep 2025 17:37 UTC
45 points
0 comments19 min readLW link

AI Safety Field Growth Anal­y­sis 2025

Stephen McAleese27 Sep 2025 17:03 UTC
29 points
13 comments3 min readLW link

2025 Petrov day speech

nick lacombe27 Sep 2025 15:07 UTC
9 points
0 comments1 min readLW link
(nikthink.net)

LLMs Suck at Deep Think­ing Part 3 - Try­ing to Prove It (fixed)

Taylor G. Lunt27 Sep 2025 14:54 UTC
17 points
6 comments15 min readLW link

Our Beloved Monsters

Tomás B.27 Sep 2025 13:25 UTC
71 points
4 comments11 min readLW link

Rank­ing the endgames of AI development

Sean Herrington27 Sep 2025 11:47 UTC
17 points
4 comments5 min readLW link

An N=1 ob­ser­va­tional study on in­ter­pretabil­ity of Nat­u­ral Gen­eral In­tel­li­gence (NGI)

dr_s27 Sep 2025 9:28 UTC
12 points
3 comments6 min readLW link

Day #14 Hunger Strike, on livestream, In protest of Su­per­in­tel­li­gent AI

samuelshadrach27 Sep 2025 9:16 UTC
2 points
0 comments2 min readLW link