Ome­las Is Perfectly Misread

Tobias H2 Oct 2025 23:11 UTC
197 points
49 comments5 min readLW link

Jour­nal­ism about game the­ory could ad­vance AI safety quickly

Chris Santos-Lang2 Oct 2025 23:05 UTC
4 points
0 comments3 min readLW link
(arxiv.org)

In which the au­thor is struck by an elec­tric couplet

Algon2 Oct 2025 21:46 UTC
10 points
5 comments2 min readLW link

Nice-ish, smooth take­off (with im­perfect safe­guards) prob­a­bly kills most “clas­sic hu­mans” in a few decades.

Raemon2 Oct 2025 21:03 UTC
143 points
19 comments12 min readLW link

Elic­it­ing se­cret knowl­edge from lan­guage models

2 Oct 2025 20:57 UTC
67 points
3 comments2 min readLW link
(arxiv.org)

The Four Pillars: A Hy­poth­e­sis for Coun­ter­ing Catas­trophic Biolog­i­cal Risk

ASB2 Oct 2025 20:20 UTC
8 points
0 comments14 min readLW link
(defensesindepth.bio)

AI Risk: Can We Thread the Nee­dle? [Recorded Talk from EA Sum­mit Van­cou­ver ’25]

Evan R. Murphy2 Oct 2025 19:08 UTC
6 points
0 comments2 min readLW link

Check­ing in on AI-2027

Baybar2 Oct 2025 18:46 UTC
119 points
21 comments4 min readLW link

Prompt Fram­ing Changes LLM Perfor­mance (and Safety)

Kilian Merkelbach2 Oct 2025 18:29 UTC
4 points
0 comments7 min readLW link

No, That’s Not What the Flight Costs

Max Niederman2 Oct 2025 17:55 UTC
45 points
15 comments1 min readLW link
(maxniederman.com)

Why the Strug­gle for Safe AI Must Be Political

2 Oct 2025 16:38 UTC
−6 points
0 comments8 min readLW link

Why AI Caste bias is more Danger­ous than you think

shanzson2 Oct 2025 16:36 UTC
0 points
1 comment6 min readLW link

Homo sapi­ens and homo silicus

2 Oct 2025 16:33 UTC
6 points
0 comments3 min readLW link

How to Feel More Alive

Logan Riggs2 Oct 2025 15:45 UTC
47 points
2 comments4 min readLW link

AI and Biolog­i­cal Risk: Fore­cast­ing Key Ca­pa­bil­ity Thresholds

Alvin Ånestrand2 Oct 2025 14:06 UTC
7 points
4 comments11 min readLW link
(forecastingaifutures.substack.com)

AI #136: A Song and Dance

Zvi2 Oct 2025 13:10 UTC
34 points
3 comments47 min readLW link
(thezvi.wordpress.com)

Some Biol­ogy Re­lated Things I Found Interesting

Morpheus2 Oct 2025 12:18 UTC
37 points
9 comments2 min readLW link

Ran­dom safe AGI idea dump

sig2 Oct 2025 10:16 UTC
−3 points
0 comments3 min readLW link

How likely are “s-risks” (large-scale suffer­ing out­comes) from un­al­igned AI com­pared to ex­tinc­tion risks?

CanYouFeelTheBenefits2 Oct 2025 10:02 UTC
5 points
0 comments1 min readLW link

Are we an ASI thought ex­per­i­ment?

Amy Rose Vossberg2 Oct 2025 1:43 UTC
−6 points
8 comments1 min readLW link

Why’s equal­ity in logic less flex­ible than in cat­e­gory the­ory?

Algon1 Oct 2025 22:03 UTC
17 points
24 comments3 min readLW link

[Linkpost] A Field Guide to Writ­ing Styles

Linch1 Oct 2025 21:49 UTC
17 points
0 comments17 min readLW link
(linch.substack.com)

</​rant> </​un­char­i­ta­ble> </​psy­chol­o­giz­ing>

Raemon1 Oct 2025 21:20 UTC
53 points
11 comments2 min readLW link

How I think about al­ign­ment and ethics as a co­op­er­a­tion pro­to­col soft­ware

Burny1 Oct 2025 21:09 UTC
3 points
0 comments1 min readLW link

In­tro­duc­ing the Mox Guest Program

1 Oct 2025 18:35 UTC
11 points
0 comments2 min readLW link
(moxsf.com)

The Prob­lem of the Con­cen­tra­tion of Power

hazem1 Oct 2025 18:13 UTC
−5 points
2 comments2 min readLW link

Claude Son­net 4.5 Is A Very Good Model

Zvi1 Oct 2025 18:00 UTC
40 points
2 comments24 min readLW link
(thezvi.wordpress.com)

My Brush with Su­per­hu­man Persuasion

Ben S.1 Oct 2025 17:50 UTC
18 points
13 comments9 min readLW link
(thebsdetector.substack.com)

AI and Cheap Weapons

Felix C.1 Oct 2025 17:31 UTC
31 points
3 comments23 min readLW link

But what kind of stuff can you just do?

Bastiaan1 Oct 2025 16:58 UTC
25 points
5 comments1 min readLW link

AI Safety at the Fron­tier: Paper High­lights, Septem­ber ’25

gasteigerjo1 Oct 2025 16:24 UTC
5 points
0 comments6 min readLW link
(aisafetyfrontier.substack.com)

Uncer­tain Up­dates: Septem­ber 2025

Gordon Seidoh Worley1 Oct 2025 14:50 UTC
11 points
0 comments1 min readLW link
(uncertainupdates.substack.com)

[CS2881r] Op­ti­miz­ing Prompts with Re­in­force­ment Learning

1 Oct 2025 14:02 UTC
2 points
0 comments5 min readLW link

“Pes­simiza­tion” is Just Or­di­nary Failure

J Bostock1 Oct 2025 13:48 UTC
56 points
2 comments6 min readLW link

Beyond the Zom­bie Argument

James Diacoumis1 Oct 2025 13:16 UTC
7 points
23 comments2 min readLW link
(jamesdiacoumis.substack.com)

Against the Inevita­bil­ity of Ha­bit­u­a­tion to Con­tin­u­ous Bliss

CanYouFeelTheBenefits1 Oct 2025 12:12 UTC
8 points
0 comments1 min readLW link

Lec­tures on statis­ti­cal learn­ing the­ory for al­ign­ment researchers

Vanessa Kosoy1 Oct 2025 8:36 UTC
41 points
1 comment1 min readLW link
(www.youtube.com)

Claude Son­net 4.5: Sys­tem Card and Alignment

Zvi30 Sep 2025 20:50 UTC
72 points
4 comments27 min readLW link
(thezvi.wordpress.com)

Halfhaven vir­tual blog­ger camp

Viliam30 Sep 2025 20:22 UTC
87 points
6 comments2 min readLW link

Masks: On the benefits and draw­backs of a so­ciety where ev­ery­one cov­er­ing their face is the norm

3Nora30 Sep 2025 18:43 UTC
−3 points
1 comment3 min readLW link

How reimag­in­ing the na­ture of con­scious­ness en­tirely changes the AI game

Jáchym Fibír30 Sep 2025 18:30 UTC
−9 points
0 comments14 min readLW link
(www.phiand.ai)

The Ba­sic Case For Doom

Bentham's Bulldog30 Sep 2025 16:04 UTC
26 points
4 comments5 min readLW link

AI Safety Re­search Futarchy: Us­ing Pre­dic­tion Mar­kets to Choose Re­search Pro­jects for MARS

JasonBrown30 Sep 2025 15:37 UTC
32 points
8 comments4 min readLW link

ARENA 7.0 - Call for Applicants

30 Sep 2025 14:54 UTC
22 points
0 comments6 min readLW link

The fa­mous sur­vivor­ship bias image is a “loose re­con­struc­tion” of meth­ods used on a hy­po­thet­i­cal dataset

Lao Mein30 Sep 2025 13:13 UTC
47 points
0 comments1 min readLW link

[GDP­val] Models Could Au­to­mate the U.S. Econ­omy by 2027

bira30 Sep 2025 11:53 UTC
14 points
0 comments1 min readLW link

Eth­i­cal De­sign Patterns

AnnaSalamon30 Sep 2025 11:52 UTC
210 points
39 comments20 min readLW link

What is the Base Model Si­mu­la­tion of Hu­man AI-As­sis­tant Con­ver­sa­tion?:

bodry30 Sep 2025 7:08 UTC
5 points
0 comments21 min readLW link

First­post: First impressions

Shell30 Sep 2025 2:23 UTC
14 points
1 comment1 min readLW link

Ex­plo­ra­tion of Coun­ter­fac­tual Im­por­tance and At­ten­tion Heads

Realmbird30 Sep 2025 1:17 UTC
12 points
0 comments6 min readLW link