A browser game about AI safety

NickSharp17 Dec 2025 22:36 UTC
18 points
4 comments1 min readLW link

What if we could grow hu­man tis­sue by re­ca­pitu­lat­ing em­bryo­ge­n­e­sis?

Abhishaike Mahajan17 Dec 2025 21:53 UTC
23 points
0 comments1 min readLW link
(www.owlposting.com)

Trans­mit­ting Misal­ign­ment with Sublimi­nal Learn­ing via Paraphrasing

17 Dec 2025 19:34 UTC
38 points
0 comments10 min readLW link

Shal­low re­view of tech­ni­cal AI safety, 2025

17 Dec 2025 18:18 UTC
175 points
9 comments83 min readLW link

An­nounc­ing RoastMyPost: LLMs Eval Blog Posts and More

ozziegooen17 Dec 2025 18:10 UTC
110 points
17 comments5 min readLW link

Align­ment Fine-Tun­ing: Les­sons from Oper­ant Con­di­tion­ing

foodforthought17 Dec 2025 16:57 UTC
5 points
4 comments10 min readLW link

Bryan Ca­plan on Eth­i­cal Intuitionism

vatsal_newsletter17 Dec 2025 16:48 UTC
−5 points
0 comments1 min readLW link
(www.readvatsal.com)

The Bleed­ing Mind

Adele Lopez17 Dec 2025 16:27 UTC
65 points
10 comments6 min readLW link

Could space de­bris block ac­cess to outer space?

fin17 Dec 2025 15:59 UTC
12 points
5 comments3 min readLW link
(www.forethought.org)

An in­tu­itive ex­pla­na­tion of back­door paths us­ing DAGs

enterthewoods17 Dec 2025 15:42 UTC
8 points
0 comments6 min readLW link

Still Too Soon

Gordon Seidoh Worley17 Dec 2025 15:40 UTC
75 points
3 comments2 min readLW link
(www.uncertainupdates.com)

The $140K Ques­tion: Cost Changes Over Time

Zvi17 Dec 2025 14:10 UTC
28 points
2 comments18 min readLW link
(thezvi.wordpress.com)

[Question] Can you recom­mend some read­ing about effec­tive en­vi­ron­men­tal­ism?

SpectrumDT17 Dec 2025 11:15 UTC
3 points
0 comments1 min readLW link

Me­mory Consolidation

Elliot Callender17 Dec 2025 11:03 UTC
2 points
0 comments2 min readLW link
(substack.com)

[Question] Cog­ni­tion Aug­men­ta­tion Org

Elliot Callender17 Dec 2025 10:49 UTC
3 points
2 comments1 min readLW link

On pub­lish­ing ev­ery day for 30 days

Alexandre Variengien17 Dec 2025 8:30 UTC
9 points
0 comments5 min readLW link
(alexandrevariengien.com)

Danc­ing in a World of Horseradish

lsusr17 Dec 2025 5:50 UTC
134 points
31 comments4 min readLW link

Video and tran­script of talk on hu­man-like-ness in AI safety

Joe Carlsmith17 Dec 2025 4:09 UTC
10 points
0 comments36 min readLW link

Les­sons from a failed am­bi­tious al­ign­ment program

Kabir Kumar17 Dec 2025 1:50 UTC
57 points
5 comments3 min readLW link

43 SAE Fea­tures Differ­en­ti­ate Con­ceal­ment from Con­fes­sion in An­thropic’s De­cep­tive Model Organism

James Hoffend17 Dec 2025 1:40 UTC
12 points
0 comments4 min readLW link

An­nounc­ing TARA: Re­ceive (and Give) Tech­ni­cal AI Safety Train­ing Without Leav­ing Your Home City

Zac Broeren17 Dec 2025 1:33 UTC
5 points
0 comments4 min readLW link

An­nounc­ing: MIRI Tech­ni­cal Gover­nance Team Re­search Fellowship

17 Dec 2025 0:02 UTC
61 points
5 comments2 min readLW link
(techgov.intelligence.org)

Non-Schem­ing Saints (Whether Hu­man Or Digi­tal) Might Be Shirk­ing Their Gover­nance Du­ties, And, If True, It Is Prob­a­bly An Ob­jec­tive Tragedy

JenniferRM16 Dec 2025 23:56 UTC
42 points
3 comments9 min readLW link

A Primer on Oper­ant Conditioning

foodforthought16 Dec 2025 21:26 UTC
5 points
0 comments4 min readLW link

Towards train­ing-time miti­ga­tions for al­ign­ment fak­ing in RL

16 Dec 2025 21:01 UTC
33 points
1 comment5 min readLW link
(alignment.anthropic.com)

Mea­sur­ing Drug Tar­get Success

sarahconstantin16 Dec 2025 21:00 UTC
19 points
3 comments2 min readLW link
(sarahconstantin.substack.com)

A Study in Attention

hamilton16 Dec 2025 20:39 UTC
14 points
0 comments2 min readLW link

Emer­gent Sycophancy

ohdearohdear16 Dec 2025 20:21 UTC
8 points
0 comments5 min readLW link

Sys­tems of Control

phoenix16 Dec 2025 19:00 UTC
15 points
3 comments22 min readLW link

Dis­cur­sive Games, Dis­cur­sive Warfare

Suspended Reason16 Dec 2025 18:24 UTC
36 points
0 comments30 min readLW link

Scien­tific break­throughs of the year

technicalities16 Dec 2025 18:00 UTC
178 points
13 comments3 min readLW link
(x.com)

In defense of slop

jasoncrawford16 Dec 2025 17:36 UTC
20 points
3 comments4 min readLW link
(newsletter.rootsofprogress.org)

TSMC most definitely has a golden record of all AI chips it made

Naci Cankaya16 Dec 2025 17:20 UTC
3 points
0 comments1 min readLW link
(nacicankaya.substack.com)

The $140,000 Question

Zvi16 Dec 2025 16:50 UTC
19 points
0 comments15 min readLW link
(thezvi.wordpress.com)

Ra­diol­ogy Au­toma­tion Does Not Gen­er­al­ize to Other Jobs

Xodarap16 Dec 2025 14:32 UTC
47 points
5 comments1 min readLW link

Fermi para­dox solu­tions map

avturchin16 Dec 2025 14:21 UTC
26 points
9 comments1 min readLW link

Ac­cord­ing to doc­tors, how fea­si­ble is pre­serv­ing the dy­ing for fu­ture re­vival?

Ariel Zeleznikow-Johnston16 Dec 2025 13:18 UTC
18 points
2 comments2 min readLW link
(open.substack.com)

A fric­tion in my deal­ings with friends who have not yet bought into the re­al­ity of AI risk

Olle Häggström16 Dec 2025 8:12 UTC
18 points
13 comments4 min readLW link

A Ra­tion­al­ist Christmas

Ryan Meservey16 Dec 2025 7:23 UTC
5 points
1 comment4 min readLW link

[Question] Why do LLMs so of­ten say “It’s not an X, it’s a Y”?

ChristianKl16 Dec 2025 1:02 UTC
28 points
13 comments1 min readLW link

Re­sponse to tito­tal’s cri­tique of our AI 2027 timelines model

16 Dec 2025 0:51 UTC
38 points
6 comments43 min readLW link
(aifuturesnotes.substack.com)

In­tro­duc­ing Lunette: au­dit­ing agents for evals and environments

15 Dec 2025 23:17 UTC
23 points
0 comments1 min readLW link
(fulcrumresearch.ai)

Pri­vate AI clouds are the fu­ture of inference

perfectfwd15 Dec 2025 23:04 UTC
3 points
0 comments9 min readLW link
(perfectforward.substack.com)

Naming

CTA15 Dec 2025 23:00 UTC
3 points
0 comments4 min readLW link

View­ing an­i­mals as eco­nomic agents

foodforthought15 Dec 2025 18:13 UTC
10 points
2 comments5 min readLW link

Digi­tal Free­dom Fund open for grant ap­pli­ca­tions (Dead­line: 17th Fe­bru­ary)

gergogaspar15 Dec 2025 16:25 UTC
8 points
0 comments1 min readLW link

Луна Лавгуд и Комната Тайн, Часть 9

15 Dec 2025 16:01 UTC
2 points
0 comments1 min readLW link

Defend­ing Against Model Weight Exfil­tra­tion Through In­fer­ence Verification

15 Dec 2025 15:26 UTC
119 points
15 comments8 min readLW link

Ro­ta­tions in Superposition

15 Dec 2025 14:58 UTC
54 points
6 comments11 min readLW link

What is an eval­u­a­tion, and why this defi­ni­tion matters

Igor Ivanov15 Dec 2025 14:53 UTC
33 points
1 comment7 min readLW link