The quo­ta­tion mark

Maxwell Peterson5 Oct 2025 23:23 UTC
21 points
8 comments13 min readLW link

The Sadism Spec­trum and How to Ac­cess It

Dawn Drescher5 Oct 2025 23:09 UTC
14 points
2 comments20 min readLW link
(impartial-priorities.org)

Maybe so­cial me­dia al­gorithms don’t suck

Algon5 Oct 2025 18:47 UTC
70 points
25 comments3 min readLW link

Base64Bench: How good are LLMs at base64, and why care about it?

richbc5 Oct 2025 18:07 UTC
39 points
10 comments11 min readLW link

[Question] What can Cana­di­ans do to help end the AI arms race?

Tom9385 Oct 2025 18:03 UTC
8 points
7 comments2 min readLW link

17 years old, self-taught state con­trol—look­ing for peo­ple who ac­tu­ally get this

Cornelius Caspian5 Oct 2025 18:02 UTC
−3 points
3 comments1 min readLW link

Be­hav­ior Best-of-N achieves Near Hu­man Perfor­mance on Com­puter Tasks

Baybar5 Oct 2025 16:53 UTC
6 points
0 comments3 min readLW link

Ac­cel­er­at­ing AI Safety Progress via Tech­ni­cal Meth­ods- Cal­ling Re­searchers, Founders, and Funders

Martin Leitgab5 Oct 2025 16:40 UTC
1 point
0 comments1 min readLW link

Mini-Sym­po­sium on Ac­cel­er­at­ing AI Safety Progress via Tech­ni­cal Meth­ods—Hy­brid In-Per­son and Virtual

Martin Leitgab5 Oct 2025 16:05 UTC
1 point
0 comments1 min readLW link

[Question] How likely are “s-risks” (large-scale suffer­ing out­comes) from un­al­igned AI com­pared to ex­tinc­tion risks?

CanYouFeelTheBenefits5 Oct 2025 14:38 UTC
15 points
2 comments1 min readLW link

LLMs are badly misaligned

Joe Rogero5 Oct 2025 14:00 UTC
27 points
25 comments3 min readLW link

The Coun­ter­fac­tual Quiet AGI Timeline

Davidmanheim5 Oct 2025 9:09 UTC
71 points
5 comments9 min readLW link

AISafety.com Read­ing Group ses­sion 328

Søren Elverlin5 Oct 2025 7:51 UTC
5 points
0 comments1 min readLW link

How the NanoGPT Speedrun WR dropped by 20% in 3 months

larry-dial5 Oct 2025 1:05 UTC
54 points
9 comments9 min readLW link

a quick thought about AI alignment

foodforthought5 Oct 2025 0:51 UTC
10 points
4 comments1 min readLW link

Mak­ing Your Pain Worse can Get You What You Want

Logan Riggs5 Oct 2025 0:19 UTC
87 points
5 comments3 min readLW link

Mar­kets in Democ­racy: What hap­pens when you can sell your vote?

Mike Evron4 Oct 2025 23:59 UTC
4 points
21 comments3 min readLW link

$250 boun­ties for the best short sto­ries set in our near fu­ture world & Brook­lyn event to se­lect them

Ramon Gonzalez4 Oct 2025 22:49 UTC
10 points
0 comments2 min readLW link

What I’ve Learnt About How to Sleep

Algon4 Oct 2025 20:52 UTC
29 points
8 comments2 min readLW link

The ‘Magic’ of LLMs: The Func­tion of Language

Joseph Banks4 Oct 2025 17:45 UTC
13 points
0 comments7 min readLW link

Open Philan­thropy’s Biose­cu­rity and Pan­demic Pre­pared­ness Team Is Hiring and Seek­ing New Grantees

miriam.hinthorn4 Oct 2025 17:42 UTC
3 points
0 comments1 min readLW link

Con­sider Small Walks at Work

Morpheus4 Oct 2025 11:53 UTC
10 points
0 comments3 min readLW link

Where does Son­net 4.5′s de­sire to “not get too com­fortable” come from?

Kaj_Sotala4 Oct 2025 10:19 UTC
103 points
23 comments64 min readLW link

Munk De­bate on AI: a few ob­ser­va­tions and opinions

[deactivated]4 Oct 2025 0:24 UTC
2 points
0 comments1 min readLW link

A Work­flow for Sys­tem Prompted Model Organisms

michaelwaves3 Oct 2025 21:39 UTC
1 point
0 comments3 min readLW link

Good­ness is harder to achieve than competence

Joe Rogero3 Oct 2025 21:32 UTC
22 points
0 comments3 min readLW link

Me­mory De­cod­ing Jour­nal Club: Con­nec­tomic traces of Heb­bian plas­tic­ity in the en­torhi­nal-hip­pocam­pal system

Devin Ward3 Oct 2025 21:24 UTC
1 point
0 comments1 min readLW link

Good is a smaller tar­get than smart

Joe Rogero3 Oct 2025 21:04 UTC
21 points
0 comments2 min readLW link

Mak­ing Sense of Con­scious­ness Part 6: Per­cep­tions of Disembodiment

sarahconstantin3 Oct 2025 20:40 UTC
27 points
0 comments8 min readLW link
(sarahconstantin.substack.com)

Re­cent AI Experiences

abramdemski3 Oct 2025 19:32 UTC
58 points
5 comments6 min readLW link

Our Ex­pe­rience Run­ning In­de­pen­dent Eval­u­a­tions on LLMs: What Have We Learned?

MAlvarado3 Oct 2025 18:26 UTC
7 points
1 comment5 min readLW link

Do One New Thing A Day To Solve Your Problems

Algon3 Oct 2025 17:08 UTC
208 points
28 comments2 min readLW link

ENAIS is look­ing for an Ex­ec­u­tive Direc­tor (ap­ply by 20th Oc­to­ber)

3 Oct 2025 15:29 UTC
16 points
0 comments2 min readLW link

An­thropic’s JumpReLU train­ing method is re­ally good

3 Oct 2025 15:23 UTC
39 points
2 comments2 min readLW link

Sora and The Big Bright Screen Slop Machine

Zvi3 Oct 2025 11:40 UTC
42 points
1 comment35 min readLW link
(thezvi.wordpress.com)

We’ve au­to­mated x-risk-pilling people

Mikhail Samin3 Oct 2025 10:26 UTC
51 points
34 comments1 min readLW link
(whycare.aisgf.us)

Open Thread Au­tumn 2025

kave3 Oct 2025 5:32 UTC
20 points
97 comments1 min readLW link

Me­mory De­cod­ing Jour­nal Club: Con­nec­tomic traces of Heb­bian plas­tic­ity in the en­torhi­nal-hip­pocam­pal system

Devin Ward3 Oct 2025 5:13 UTC
1 point
0 comments1 min readLW link

Prompt­ing My­self: Maybe it’s not a damn plat­i­tude?

CstineSublime3 Oct 2025 2:28 UTC
9 points
2 comments1 min readLW link

IABIED and Memetic Engineering

Error3 Oct 2025 1:01 UTC
49 points
5 comments4 min readLW link

An­ti­so­cial me­dia: AI’s kil­ler app?

David Scott Krueger (formerly: capybaralet)3 Oct 2025 0:00 UTC
35 points
8 comments5 min readLW link
(therealartificialintelligence.substack.com)

Ome­las Is Perfectly Misread

Tobias H2 Oct 2025 23:11 UTC
221 points
59 comments5 min readLW link

Jour­nal­ism about game the­ory could ad­vance AI safety quickly

Chris Santos-Lang2 Oct 2025 23:05 UTC
8 points
0 comments3 min readLW link
(arxiv.org)

In which the au­thor is struck by an elec­tric couplet

Algon2 Oct 2025 21:46 UTC
10 points
5 comments2 min readLW link

Nice-ish, smooth take­off (with im­perfect safe­guards) prob­a­bly kills most “clas­sic hu­mans” in a few decades.

Raemon2 Oct 2025 21:03 UTC
155 points
19 comments12 min readLW link

Elic­it­ing se­cret knowl­edge from lan­guage models

2 Oct 2025 20:57 UTC
68 points
3 comments2 min readLW link
(arxiv.org)

The Four Pillars: A Hy­poth­e­sis for Coun­ter­ing Catas­trophic Biolog­i­cal Risk

ASB2 Oct 2025 20:20 UTC
9 points
0 comments14 min readLW link
(defensesindepth.bio)

AI Risk: Can We Thread the Nee­dle? [Recorded Talk from EA Sum­mit Van­cou­ver ’25]

Evan R. Murphy2 Oct 2025 19:08 UTC
6 points
0 comments2 min readLW link

Check­ing in on AI-2027

Baybar2 Oct 2025 18:46 UTC
128 points
22 comments4 min readLW link

Prompt Fram­ing Changes LLM Perfor­mance (and Safety)

Kilian Merkelbach2 Oct 2025 18:29 UTC
5 points
0 comments7 min readLW link