The Grapes of Hardness

adamShimi11 Mar 2025 21:01 UTC
8 points
0 comments5 min readLW link
(formethods.substack.com)

Don’t over-up­date on Fron­tierMath results

David Matolcsi11 Mar 2025 20:44 UTC
51 points
7 comments9 min readLW link

Re­sponse to Scott Alexan­der on Imprisonment

Zvi11 Mar 2025 20:40 UTC
40 points
4 comments9 min readLW link
(thezvi.wordpress.com)

Paths and waysta­tions in AI safety

Joe Carlsmith11 Mar 2025 18:52 UTC
42 points
1 comment11 min readLW link
(joecarlsmith.substack.com)

Meri­dian Cam­bridge Visit­ing Re­searcher Pro­gramme: Turn AI safety ideas into funded pro­jects in one week!

Meridian Cambridge11 Mar 2025 17:46 UTC
13 points
0 comments2 min readLW link

Elon Musk May Be Tran­si­tion­ing to Bipo­lar Type I

Cyborg2511 Mar 2025 17:45 UTC
86 points
22 comments4 min readLW link

Scal­ing AI Reg­u­la­tion: Real­is­ti­cally, what Can (and Can’t) Be Reg­u­lated?

Katalina Hernandez11 Mar 2025 16:51 UTC
3 points
1 comment3 min readLW link

How Lan­guage Models Un­der­stand Nullability

11 Mar 2025 15:57 UTC
5 points
0 comments2 min readLW link
(dmodel.ai)

Forethought: a new AI macros­trat­egy group

11 Mar 2025 15:39 UTC
20 points
0 comments3 min readLW link

Prepar­ing for the In­tel­li­gence Explosion

11 Mar 2025 15:38 UTC
78 points
17 comments1 min readLW link
(www.forethought.org)

stop solv­ing prob­lems that have already been solved

dhruvmethi11 Mar 2025 15:30 UTC
10 points
3 comments8 min readLW link

AI Con­trol May In­crease Ex­is­ten­tial Risk

Jan_Kulveit11 Mar 2025 14:30 UTC
101 points
13 comments1 min readLW link

When is it Bet­ter to Train on the Align­ment Proxy?

dil-leik-og11 Mar 2025 13:35 UTC
14 points
0 comments9 min readLW link

A differ­ent take on the Musk v OpenAI pre­limi­nary in­junc­tion order

TFD11 Mar 2025 12:46 UTC
8 points
0 comments20 min readLW link
(www.thefloatingdroid.com)

Do rea­son­ing mod­els use their scratch­pad like we do? Ev­i­dence from dis­till­ing paraphrases

Fabien Roger11 Mar 2025 11:52 UTC
127 points
23 comments11 min readLW link
(alignment.anthropic.com)

A Hog­warts Guide to Citizenship

WillPetillo11 Mar 2025 5:50 UTC
7 points
1 comment3 min readLW link

Cog­ni­tive Refram­ing—How to Over­come Nega­tive Thought Pat­terns and Behaviors

Mr. Keating11 Mar 2025 4:56 UTC
12 points
0 comments4 min readLW link

Tro­jan Sky

Richard_Ngo11 Mar 2025 3:14 UTC
253 points
39 comments12 min readLW link
(www.narrativeark.xyz)

OpenAI: De­tect­ing mis­be­hav­ior in fron­tier rea­son­ing models

Daniel Kokotajlo11 Mar 2025 2:17 UTC
183 points
26 comments4 min readLW link
(openai.com)

HPMOR An­niver­sary Par­ties: Co­or­di­na­tion, Re­sources, and Discussion

Screwtape11 Mar 2025 1:30 UTC
52 points
6 comments7 min readLW link

Po­si­tional ker­nels of at­ten­tion heads

Alex Gibson10 Mar 2025 23:17 UTC
9 points
0 comments4 min readLW link

Progress links and short notes, 2025-03-10

jasoncrawford10 Mar 2025 20:27 UTC
8 points
0 comments4 min readLW link
(newsletter.rootsofprogress.org)

The Manus Mar­ket­ing Madness

Zvi10 Mar 2025 20:10 UTC
54 points
0 comments24 min readLW link
(thezvi.wordpress.com)

You can just play

aswath krishnan10 Mar 2025 20:00 UTC
−5 points
0 comments2 min readLW link

How to Use Prompt Eng­ineer­ing to Rewire Your Brain

aswath krishnan10 Mar 2025 20:00 UTC
1 point
0 comments5 min readLW link
(www.aswathkrishnan.com)

When In­de­pen­dent Op­ti­miza­tion Is Worse Than Randomness

Chaotic rationalist10 Mar 2025 19:46 UTC
−4 points
0 comments2 min readLW link

Stress ex­ists only where the Mind makes it

Noahh10 Mar 2025 19:44 UTC
5 points
2 comments4 min readLW link

Coun­ter­ar­gu­ment to Godel’s Mo­dal On­tolog­i­cal Argument

Wynn10 Mar 2025 19:38 UTC
−1 points
0 comments4 min readLW link

[Question] How much do fron­tier LLMs code and browse while in train­ing?

Joe Rogero10 Mar 2025 19:34 UTC
7 points
0 comments1 min readLW link

Ob­ser­va­tions on self-su­per­vised Learn­ing for vision

Dinkar Juyal10 Mar 2025 19:31 UTC
3 points
0 comments5 min readLW link

In­tro­duc­ing 11 New AI Safety Or­ga­ni­za­tions—Cat­alyze’s Win­ter 24/​25 Lon­don In­cu­ba­tion Pro­gram Cohort

Alexandra Bos10 Mar 2025 19:26 UTC
75 points
0 comments14 min readLW link

The Jack­pot Jinx (or why “Su­per­in­tel­li­gence Strat­egy” is wrong)

E.G. Blee-Goldman10 Mar 2025 19:18 UTC
13 points
0 comments5 min readLW link

Effec­tive AI Outreach | A Data Driven Approach

NoahCWilson10 Mar 2025 19:18 UTC
1 point
0 comments15 min readLW link

Emer­gent AI So­ciety. Tasks, Scarcity, Talks

Andrey Seryakov10 Mar 2025 19:18 UTC
1 point
0 comments5 min readLW link

Sen­tinel min­utes #10/​2025: Trump tar­iffs, US/​China ten­sions, Claude code re­ward hack­ing.

NunoSempere10 Mar 2025 19:00 UTC
25 points
0 comments10 min readLW link
(blog.sentinel-team.org)

Have you ac­tu­ally tried rais­ing the birth rate?

Yair Halberstadt10 Mar 2025 18:06 UTC
6 points
5 comments1 min readLW link

Split Per­son­al­ity Train­ing: Re­veal­ing La­tent Knowl­edge Through Per­son­al­ity-Shift Tokens

Florian_Dietz10 Mar 2025 16:07 UTC
42 points
7 comments9 min readLW link

We Have No Plan for Prevent­ing Loss of Con­trol in Open Models

Andrew Dickson10 Mar 2025 15:35 UTC
46 points
11 comments22 min readLW link

Lock-In Threat Models

alamerton10 Mar 2025 10:22 UTC
5 points
0 comments8 min readLW link

Book Re­view: Affec­tive Neuroscience

sarahconstantin10 Mar 2025 6:50 UTC
62 points
8 comments13 min readLW link
(sarahconstantin.substack.com)

The chess­board world

phdead10 Mar 2025 1:26 UTC
5 points
0 comments8 min readLW link

[Question] when will LLMs be­come hu­man-level blog­gers?

nostalgebraist9 Mar 2025 21:10 UTC
125 points
34 comments6 min readLW link

Every­thing I Know About Se­man­tics I Learned From Mu­sic Notation

J Bostock9 Mar 2025 18:09 UTC
34 points
2 comments10 min readLW link

Phoenix Rising

Metacelsus9 Mar 2025 11:53 UTC
67 points
7 comments5 min readLW link
(denovo.substack.com)

How well can Claude write cod­ing ques­tions?

bodry9 Mar 2025 5:29 UTC
3 points
1 comment12 min readLW link

A model of the fi­nal phase: the cur­rent fron­tier AIs as de facto CEOs of their own com­pa­nies

Mitchell_Porter8 Mar 2025 22:15 UTC
23 points
2 comments1 min readLW link

Harry Pot­ter and the Meth­ods of Ra­tion­al­ity 10 Year An­niver­sary Party!

Robert Cousineau8 Mar 2025 21:29 UTC
6 points
0 comments1 min readLW link

A case for peer-re­viewed con­spir­acy theories

Sam G8 Mar 2025 20:41 UTC
13 points
3 comments4 min readLW link

The ma­chine has no mouth and it must scream

zef8 Mar 2025 16:40 UTC
80 points
1 comment7 min readLW link
(zephyyr.substack.com)

How Do We Fix the Ed­u­ca­tion Cri­sis?

James Camacho8 Mar 2025 2:59 UTC
12 points
4 comments8 min readLW link