Fic­tional Think­ing and Real Thinking

johnswentworth17 Jun 2025 19:13 UTC
57 points
11 comments4 min readLW link

The Cu­ri­ous Case of the bos_token

larry-dial17 Jun 2025 19:00 UTC
26 points
4 comments10 min readLW link

AISN #57: The RAISE Act

17 Jun 2025 18:02 UTC
6 points
0 comments3 min readLW link
(newsletter.safe.ai)

AI Safety at the Fron­tier: Paper High­lights, May ’25

gasteigerjo17 Jun 2025 17:16 UTC
6 points
0 comments8 min readLW link
(aisafetyfrontier.substack.com)

[Linkpost] The lethal trifecta for AI agents: pri­vate data, un­trusted con­tent, and ex­ter­nal communication

Gunnar_Zarncke17 Jun 2025 16:09 UTC
13 points
3 comments1 min readLW link
(simonwillison.net)

Agen­tic In­ter­pretabil­ity: A Strat­egy Against Grad­ual Disempowerment

17 Jun 2025 14:52 UTC
17 points
6 comments2 min readLW link

Prover-Es­ti­ma­tor De­bate: A New Scal­able Over­sight Protocol

17 Jun 2025 13:53 UTC
89 points
19 comments5 min readLW link

o3 Turns Pro

Zvi17 Jun 2025 13:50 UTC
30 points
1 comment14 min readLW link
(thezvi.wordpress.com)

Watch R1 “think” with an­i­mated chains of thought

future_detective17 Jun 2025 10:38 UTC
4 points
0 comments1 min readLW link
(github.com)

Serv­ing LLM on Huawei CloudMatrix

sanxiyn17 Jun 2025 5:59 UTC
24 points
7 comments1 min readLW link
(arxiv.org)

Per­sonal agents

Roman Leventov17 Jun 2025 2:05 UTC
9 points
1 comment7 min readLW link

I made a card game to re­duce cog­ni­tive bi­ases and log­i­cal fal­la­cies but I’m not sure what DV to test in a study on its effec­tive­ness.

Brad Dunn17 Jun 2025 1:02 UTC
50 points
15 comments5 min readLW link

Notes on Meetup Ideas

Commander Zander17 Jun 2025 0:11 UTC
12 points
4 comments2 min readLW link

Dark­ness Med­i­ta­tion—for NZ Win­ter Sols­tice 2025

joshuamerriam16 Jun 2025 23:58 UTC
2 points
0 comments4 min readLW link

[Question] Are su­per­hu­man sa­vants real?

Bunthut16 Jun 2025 22:02 UTC
15 points
4 comments1 min readLW link

Ok, AI Can Write Pretty Good Fic­tion Now

JustisMills16 Jun 2025 21:13 UTC
59 points
34 comments6 min readLW link
(justismills.substack.com)

Sub­jec­tive ex­pe­rience is most likely physical

martinkunev16 Jun 2025 20:54 UTC
5 points
3 comments4 min readLW link

VLMs can Ag­gre­gate Scat­tered Train­ing Patches

LINGJIE CHEN16 Jun 2025 18:25 UTC
2 points
0 comments4 min readLW link

Set­point = The ex­pe­rience we at­tend to

jimmy16 Jun 2025 17:34 UTC
22 points
0 comments7 min readLW link

Thought Crime: Back­doors & Emer­gent Misal­ign­ment in Rea­son­ing Models

16 Jun 2025 16:43 UTC
69 points
2 comments8 min readLW link

How LLM Beliefs Change Dur­ing Chain-of-Thought Reasoning

16 Jun 2025 16:18 UTC
32 points
3 comments5 min readLW link

Con­ver­gent Lin­ear Rep­re­sen­ta­tions of Emer­gent Misalignment

16 Jun 2025 15:47 UTC
76 points
1 comment8 min readLW link

Model Or­ganisms for Emer­gent Misalignment

16 Jun 2025 15:46 UTC
118 points
19 comments5 min readLW link

Coach­ing AI: A Re­la­tional Ap­proach to AI Safety

Priyanka Bharadwaj16 Jun 2025 15:33 UTC
11 points
0 comments5 min readLW link

Me­mories of the Neu­tral Zone

Jordan Rubin16 Jun 2025 15:33 UTC
7 points
0 comments3 min readLW link
(jordanmrubin.substack.com)

Do LLMs Com­ply Differ­ently Dur­ing Tests? Is This a Hid­den Vari­able in Safety Eval­u­a­tion? And Can We Steer That?

Sahar Abdelnabi16 Jun 2025 13:52 UTC
17 points
0 comments6 min readLW link

RTFB: The RAISE Act

Zvi16 Jun 2025 12:50 UTC
97 points
8 comments8 min readLW link
(thezvi.wordpress.com)

[Question] Galaxy-Brain Hobo An­tibiotics?

Lorec16 Jun 2025 12:43 UTC
3 points
9 comments4 min readLW link

The EU com­mis­sion seeks ex­pert ad­visers on AI

PabloAMC16 Jun 2025 12:28 UTC
7 points
0 comments1 min readLW link

Dou­ble Crux: Master the art of pro­duc­tive disagreement

marta_k16 Jun 2025 11:15 UTC
2 points
0 comments1 min readLW link

From Paper­clips to Bombs: The Evolu­tion of AI Risk Dis­course on LessWrong

David Harket16 Jun 2025 5:16 UTC
3 points
0 comments24 min readLW link

Donut­ting is bad

Jarrah16 Jun 2025 4:12 UTC
20 points
4 comments1 min readLW link

Futarchy us­ing a sealed-bid auc­tion to avoid liquidity problems

Christopher King16 Jun 2025 1:34 UTC
21 points
6 comments8 min readLW link

Me­mory De­cod­ing Jour­nal Club: Neo­cor­ti­cal synap­tic en­grams for re­mote con­tex­tual memories

Devin Ward15 Jun 2025 23:22 UTC
1 point
0 comments1 min readLW link

Every Ma­jor LLM En­dorses New­comb One-Boxing

jackmastermind15 Jun 2025 20:44 UTC
19 points
13 comments1 min readLW link
(jacktlab.substack.com)

FDT Does Not En­dorse It­self in Asym­met­ric Games

jackmastermind15 Jun 2025 20:44 UTC
23 points
3 comments5 min readLW link

Can We Change the Goals of a Toy RL Agent?

15 Jun 2025 20:34 UTC
20 points
0 comments9 min readLW link

Some re­pro­ge­net­ics-re­lated pro­jects you could help with

TsviBT15 Jun 2025 20:25 UTC
80 points
1 comment4 min readLW link

Risk To­kens: Eco­nomic Se­cu­rity in AI Safety

mhdempsey15 Jun 2025 19:25 UTC
1 point
0 comments6 min readLW link
(www.michaeldempsey.me)

Aligned mon­e­ti­za­tion of mod­ern dating

kwang15 Jun 2025 16:01 UTC
0 points
0 comments3 min readLW link
(kevw.substack.com)

In­tel­li­gence Is Not Magic, But Your Thresh­old For “Magic” Is Pretty Low

Expertium15 Jun 2025 15:23 UTC
215 points
27 comments1 min readLW link

Estro­gen: A trip report

cube_flipper15 Jun 2025 13:15 UTC
167 points
42 comments27 min readLW link
(smoothbrains.net)

[Question] Do mul­ti­modal LLMs (like 4o) use OCR un­der the hood to read dense text in images?

2PuNCheeZ15 Jun 2025 11:20 UTC
4 points
1 comment1 min readLW link

Book re­view: Air-borne by Carl Zimmer

eukaryote15 Jun 2025 5:49 UTC
34 points
0 comments11 min readLW link
(eukaryotewritesblog.com)

My fa­vorite Soviet songs

Nina Panickssery15 Jun 2025 2:48 UTC
22 points
1 comment5 min readLW link
(ninapanickssery.substack.com)

Side quests in cur­ricu­lum learn­ing and regularization

Sandy Fraser15 Jun 2025 2:03 UTC
5 points
0 comments10 min readLW link

AXRP Epi­sode 43 - David Lind­ner on My­opic Op­ti­miza­tion with Non-my­opic Approval

DanielFilan15 Jun 2025 1:20 UTC
12 points
0 comments56 min readLW link

Jailbreak­ing Claude 4 and Other Fron­tier Lan­guage Models

James Sullivan15 Jun 2025 0:31 UTC
1 point
0 comments3 min readLW link
(open.substack.com)

En­dometri­o­sis is an in­cred­ibly in­ter­est­ing disease

Abhishaike Mahajan14 Jun 2025 22:14 UTC
166 points
5 comments16 min readLW link
(www.owlposting.com)

Field Notes from Ship­ping Real Code with Claude

creatorrr14 Jun 2025 16:36 UTC
22 points
0 comments12 min readLW link
(diwank.space)