A Work­flow for Sys­tem Prompted Model Organisms

michaelwaves3 Oct 2025 21:39 UTC
1 point
0 comments3 min readLW link

Good­ness is harder to achieve than competence

Joe Rogero3 Oct 2025 21:32 UTC
22 points
0 comments3 min readLW link

Me­mory De­cod­ing Jour­nal Club: Con­nec­tomic traces of Heb­bian plas­tic­ity in the en­torhi­nal-hip­pocam­pal system

Devin Ward3 Oct 2025 21:24 UTC
1 point
0 comments1 min readLW link

Good is a smaller tar­get than smart

Joe Rogero3 Oct 2025 21:04 UTC
21 points
0 comments2 min readLW link

Mak­ing Sense of Con­scious­ness Part 6: Per­cep­tions of Disembodiment

sarahconstantin3 Oct 2025 20:40 UTC
27 points
0 comments8 min readLW link
(sarahconstantin.substack.com)

Re­cent AI Experiences

abramdemski3 Oct 2025 19:32 UTC
54 points
1 comment6 min readLW link

Our Ex­pe­rience Run­ning In­de­pen­dent Eval­u­a­tions on LLMs: What Have We Learned?

MAlvarado3 Oct 2025 18:26 UTC
7 points
1 comment5 min readLW link

Do One New Thing A Day To Solve Your Problems

Algon3 Oct 2025 17:08 UTC
102 points
5 comments2 min readLW link

ENAIS is look­ing for an Ex­ec­u­tive Direc­tor (ap­ply by 20th Oc­to­ber)

3 Oct 2025 15:29 UTC
11 points
0 comments2 min readLW link

An­thropic’s JumpReLU train­ing method is re­ally good

3 Oct 2025 15:23 UTC
22 points
0 comments2 min readLW link

Sora and The Big Bright Screen Slop Machine

Zvi3 Oct 2025 11:40 UTC
38 points
1 comment35 min readLW link
(thezvi.wordpress.com)

We’ve au­to­mated x-risk-pilling people

Mikhail Samin3 Oct 2025 10:26 UTC
51 points
27 comments1 min readLW link
(whycare.aisgf.us)

Open Thread Au­tumn 2025

kave3 Oct 2025 5:32 UTC
16 points
29 comments1 min readLW link

Me­mory De­cod­ing Jour­nal Club: Con­nec­tomic traces of Heb­bian plas­tic­ity in the en­torhi­nal-hip­pocam­pal system

Devin Ward3 Oct 2025 5:13 UTC
1 point
0 comments1 min readLW link

Prompt­ing My­self: Maybe it’s not a damn plat­i­tude?

CstineSublime3 Oct 2025 2:28 UTC
9 points
1 comment1 min readLW link

IABIED and Memetic Engineering

Error3 Oct 2025 1:01 UTC
47 points
5 comments4 min readLW link

An­ti­so­cial me­dia: AI’s kil­ler app?

David Scott Krueger (formerly: capybaralet)3 Oct 2025 0:00 UTC
35 points
8 comments5 min readLW link
(therealartificialintelligence.substack.com)

Ome­las Is Perfectly Misread

Tobias H2 Oct 2025 23:11 UTC
197 points
49 comments5 min readLW link

Jour­nal­ism about game the­ory could ad­vance AI safety quickly

Chris Santos-Lang2 Oct 2025 23:05 UTC
4 points
0 comments3 min readLW link
(arxiv.org)

In which the au­thor is struck by an elec­tric couplet

Algon2 Oct 2025 21:46 UTC
10 points
5 comments2 min readLW link

Nice-ish, smooth take­off (with im­perfect safe­guards) prob­a­bly kills most “clas­sic hu­mans” in a few decades.

Raemon2 Oct 2025 21:03 UTC
143 points
19 comments12 min readLW link

Elic­it­ing se­cret knowl­edge from lan­guage models

2 Oct 2025 20:57 UTC
67 points
3 comments2 min readLW link
(arxiv.org)

The Four Pillars: A Hy­poth­e­sis for Coun­ter­ing Catas­trophic Biolog­i­cal Risk

ASB2 Oct 2025 20:20 UTC
8 points
0 comments14 min readLW link
(defensesindepth.bio)

AI Risk: Can We Thread the Nee­dle? [Recorded Talk from EA Sum­mit Van­cou­ver ’25]

Evan R. Murphy2 Oct 2025 19:08 UTC
6 points
0 comments2 min readLW link

Check­ing in on AI-2027

Baybar2 Oct 2025 18:46 UTC
119 points
21 comments4 min readLW link

Prompt Fram­ing Changes LLM Perfor­mance (and Safety)

Kilian Merkelbach2 Oct 2025 18:29 UTC
4 points
0 comments7 min readLW link

No, That’s Not What the Flight Costs

Max Niederman2 Oct 2025 17:55 UTC
45 points
15 comments1 min readLW link
(maxniederman.com)

Why the Strug­gle for Safe AI Must Be Political

2 Oct 2025 16:38 UTC
−6 points
0 comments8 min readLW link

Why AI Caste bias is more Danger­ous than you think

shanzson2 Oct 2025 16:36 UTC
0 points
1 comment6 min readLW link

Homo sapi­ens and homo silicus

2 Oct 2025 16:33 UTC
6 points
0 comments3 min readLW link

How to Feel More Alive

Logan Riggs2 Oct 2025 15:45 UTC
47 points
2 comments4 min readLW link

AI and Biolog­i­cal Risk: Fore­cast­ing Key Ca­pa­bil­ity Thresholds

Alvin Ånestrand2 Oct 2025 14:06 UTC
7 points
4 comments11 min readLW link
(forecastingaifutures.substack.com)

AI #136: A Song and Dance

Zvi2 Oct 2025 13:10 UTC
34 points
3 comments47 min readLW link
(thezvi.wordpress.com)

Some Biol­ogy Re­lated Things I Found Interesting

Morpheus2 Oct 2025 12:18 UTC
37 points
9 comments2 min readLW link

Ran­dom safe AGI idea dump

sig2 Oct 2025 10:16 UTC
−3 points
0 comments3 min readLW link

How likely are “s-risks” (large-scale suffer­ing out­comes) from un­al­igned AI com­pared to ex­tinc­tion risks?

CanYouFeelTheBenefits2 Oct 2025 10:02 UTC
5 points
0 comments1 min readLW link

Are we an ASI thought ex­per­i­ment?

Amy Rose Vossberg2 Oct 2025 1:43 UTC
−6 points
8 comments1 min readLW link

Why’s equal­ity in logic less flex­ible than in cat­e­gory the­ory?

Algon1 Oct 2025 22:03 UTC
17 points
24 comments3 min readLW link

[Linkpost] A Field Guide to Writ­ing Styles

Linch1 Oct 2025 21:49 UTC
17 points
0 comments17 min readLW link
(linch.substack.com)

</​rant> </​un­char­i­ta­ble> </​psy­chol­o­giz­ing>

Raemon1 Oct 2025 21:20 UTC
53 points
11 comments2 min readLW link

How I think about al­ign­ment and ethics as a co­op­er­a­tion pro­to­col soft­ware

Burny1 Oct 2025 21:09 UTC
3 points
0 comments1 min readLW link

In­tro­duc­ing the Mox Guest Program

1 Oct 2025 18:35 UTC
11 points
0 comments2 min readLW link
(moxsf.com)

The Prob­lem of the Con­cen­tra­tion of Power

hazem1 Oct 2025 18:13 UTC
−5 points
2 comments2 min readLW link

Claude Son­net 4.5 Is A Very Good Model

Zvi1 Oct 2025 18:00 UTC
40 points
2 comments24 min readLW link
(thezvi.wordpress.com)

My Brush with Su­per­hu­man Persuasion

Ben S.1 Oct 2025 17:50 UTC
18 points
13 comments9 min readLW link
(thebsdetector.substack.com)

AI and Cheap Weapons

Felix C.1 Oct 2025 17:31 UTC
31 points
3 comments23 min readLW link

But what kind of stuff can you just do?

Bastiaan1 Oct 2025 16:58 UTC
25 points
5 comments1 min readLW link

AI Safety at the Fron­tier: Paper High­lights, Septem­ber ’25

gasteigerjo1 Oct 2025 16:24 UTC
5 points
0 comments6 min readLW link
(aisafetyfrontier.substack.com)

Uncer­tain Up­dates: Septem­ber 2025

Gordon Seidoh Worley1 Oct 2025 14:50 UTC
11 points
0 comments1 min readLW link
(uncertainupdates.substack.com)

[CS2881r] Op­ti­miz­ing Prompts with Re­in­force­ment Learning

1 Oct 2025 14:02 UTC
2 points
0 comments5 min readLW link