Truth

Kabir Kumar28 Aug 2025 20:53 UTC
6 points
0 comments2 min readLW link
(kkumar97.blogspot.com)

Here’s 18 Ap­pli­ca­tions of De­cep­tion Probes

28 Aug 2025 18:59 UTC
38 points
0 comments22 min readLW link

LW@Dragoncon Meetup

Error28 Aug 2025 18:40 UTC
7 points
0 comments1 min readLW link

If we can ed­u­cate AIs, why not ap­ply that ed­u­ca­tion to peo­ple? - A Si­mu­la­tion with Claude

P. João28 Aug 2025 16:37 UTC
3 points
0 comments7 min readLW link

AI #131 Part 1: Gem­ini 2.5 Flash Image is Cool

Zvi28 Aug 2025 16:20 UTC
39 points
4 comments30 min readLW link
(thezvi.wordpress.com)

Von Neu­mann’s Fal­lacy and You

incident-recipient28 Aug 2025 15:52 UTC
98 points
29 comments4 min readLW link

AI mis­be­havi­our in the wild from An­don Labs’ Safety Report

Lukas Petersson28 Aug 2025 15:10 UTC
39 points
0 comments1 min readLW link
(andonlabs.com)

The Other Align­ment Prob­lems: How epistemic, moral and aes­thetic norms get entangled

James Diacoumis28 Aug 2025 11:26 UTC
3 points
0 comments5 min readLW link

We should think about the pivotal act again. Here’s a bet­ter ver­sion of it.

otto.barten28 Aug 2025 9:29 UTC
11 points
2 comments3 min readLW link

Elab­o­ra­tive reading

DirectedEvolution28 Aug 2025 8:55 UTC
20 points
0 comments9 min readLW link

Pro­fan­ity causes emer­gent mis­al­ign­ment, but with qual­i­ta­tively differ­ent re­sults than in­se­cure code

megasilverfist28 Aug 2025 8:22 UTC
21 points
2 comments8 min readLW link

Us­ing Psy­chol­in­guis­tic Sig­nals to Im­prove AI Safety

Jkreindler27 Aug 2025 22:30 UTC
−2 points
0 comments4 min readLW link

Tran­si­tion and So­cial Dy­nam­ics of a post-co­or­di­na­tion world

Lessbroken27 Aug 2025 22:23 UTC
1 point
0 comments7 min readLW link

Tech­ni­cal AI Safety re­search tax­on­omy at­tempt (2025)

Benjamin Plaut27 Aug 2025 22:17 UTC
2 points
0 comments2 min readLW link

The Fu­ture of AI Agents

kavya27 Aug 2025 21:58 UTC
6 points
8 comments5 min readLW link

Against “Model Welfare” in 2025

Haley Moller27 Aug 2025 21:56 UTC
−10 points
8 comments4 min readLW link

Are They Start­ing To Take Our Jobs?

Zvi27 Aug 2025 18:50 UTC
44 points
6 comments5 min readLW link
(thezvi.wordpress.com)

Will Any Crap Cause Emer­gent Misal­ign­ment?

J Bostock27 Aug 2025 18:20 UTC
192 points
37 comments3 min readLW link

Open Global In­vest­ment as a Gover­nance Model for AGI

Nick Bostrom27 Aug 2025 17:42 UTC
152 points
47 comments39 min readLW link
(nickbostrom.com)

Uncer­tain Up­dates Au­gust 2025

Gordon Seidoh Worley27 Aug 2025 17:31 UTC
11 points
1 comment2 min readLW link
(uncertainupdates.substack.com)

At­tach­ing re­quire­ments to model re­leases has se­ri­ous down­sides (rel­a­tive to a differ­ent dead­line for these re­quire­ments)

ryan_greenblatt27 Aug 2025 17:04 UTC
99 points
2 comments3 min readLW link

[An­thropic] A hacker used Claude Code to au­to­mate ransomware

bohaska27 Aug 2025 14:57 UTC
86 points
25 comments3 min readLW link
(www.anthropic.com)

AI com­pa­nies have started say­ing safe­guards are load-bearing

Zach Stein-Perlman27 Aug 2025 13:00 UTC
52 points
2 comments5 min readLW link

Would you sell your soul to save it? ( I am NOT a Chris­tian)

AdamLacerdo27 Aug 2025 11:05 UTC
−21 points
8 comments4 min readLW link

Le­gal Per­son­hood—The Fifth Amend­ment (Part 2)

Stephen Martin27 Aug 2025 9:03 UTC
5 points
2 comments4 min readLW link

Con­tra Yud­kowsky’s Ideal Bayesian

vae27 Aug 2025 5:43 UTC
51 points
17 comments13 min readLW link

Cal­ibrat­ing an Ul­tra­sonic Hu­mid­ifier for Gly­col Vapors

jefftk27 Aug 2025 1:40 UTC
11 points
2 comments1 min readLW link
(www.jefftk.com)

Mis­gen­er­al­iza­tion of Fic­tional Train­ing Data as a Con­trib­u­tor to Misalignment

Mark Keavney27 Aug 2025 1:01 UTC
9 points
1 comment2 min readLW link

[Question] How are you ap­proach­ing cog­ni­tive se­cu­rity as AI be­comes more ca­pa­ble?

james oofou26 Aug 2025 20:52 UTC
11 points
1 comment1 min readLW link

AI In­duced Psy­chosis: A shal­low investigation

Tim Hua26 Aug 2025 20:03 UTC
359 points
43 comments26 min readLW link

Aes­thetic Prefer­ences Can Cause Emer­gent Misalignment

Anders Woodruff26 Aug 2025 18:41 UTC
90 points
16 comments3 min readLW link

ACX Fall Meetup

Adriana L26 Aug 2025 18:34 UTC
1 point
0 comments1 min readLW link

Harm­less re­ward hacks can gen­er­al­ize to mis­al­ign­ment in LLMs

26 Aug 2025 17:32 UTC
46 points
7 comments7 min readLW link

Do-Diver­gence: A Bound for Maxwell’s Demon

26 Aug 2025 17:07 UTC
66 points
4 comments3 min readLW link

Re­ports Of AI Not Pro­gress­ing Or Offer­ing Mun­dane Utility Are Often Greatly Exaggerated

Zvi26 Aug 2025 14:00 UTC
42 points
1 comment16 min readLW link
(thezvi.wordpress.com)

Gamblification

Aprillion26 Aug 2025 11:48 UTC
23 points
15 comments2 min readLW link

A spec­u­la­tion on enlightenment

Richard_Kennaway26 Aug 2025 11:23 UTC
16 points
17 comments2 min readLW link

The “Spar­sity vs Re­con­struc­tion Trade­off” Illusion

26 Aug 2025 4:39 UTC
13 points
0 comments4 min readLW link

Le­gal Per­son­hood—The Fifth Amend­ment (Part 1)

Stephen Martin26 Aug 2025 4:05 UTC
4 points
0 comments5 min readLW link

LLMs Are Trained to As­sume Their Out­put Is Perfect

Brendan Long26 Aug 2025 0:24 UTC
10 points
0 comments5 min readLW link

New Paper on Reflec­tive Or­a­cles & Grain of Truth Problem

Cole Wyeth26 Aug 2025 0:18 UTC
53 points
0 comments1 min readLW link

Hid­den Rea­son­ing in LLMs: A Taxonomy

25 Aug 2025 22:43 UTC
65 points
10 comments12 min readLW link

The NAO is Hiring for Part­ner­ships, Re­sponse, Virol­ogy, and Wet Lab Management

jefftk25 Aug 2025 22:37 UTC
16 points
0 comments2 min readLW link
(naobservatory.org)

ACX/​SSC Meetup

teegs25 Aug 2025 20:21 UTC
1 point
0 comments1 min readLW link

Proac­tive AI Con­trol: A Case for Bat­tery-Depen­dent Systems

Jesper Lindholm25 Aug 2025 20:04 UTC
4 points
0 comments13 min readLW link

Solv­ing ir­ra­tional fear as de­cid­ing: A worked example

jimmy25 Aug 2025 19:44 UTC
24 points
4 comments7 min readLW link

Breast­feed­ing and IQ: Effects shrink as you con­trol for more confounders

Nina Panickssery25 Aug 2025 18:43 UTC
44 points
3 comments1 min readLW link
(blog.ninapanickssery.com)

Qual­ity Precision

Ben25 Aug 2025 17:58 UTC
24 points
13 comments3 min readLW link

Neu­ro­science of hu­man sex­ual at­trac­tion trig­gers (3 hy­pothe­ses)

Steven Byrnes25 Aug 2025 17:51 UTC
54 points
6 comments12 min readLW link

Be­fore LLM Psy­chosis, There Was Yes-Man Psychosis

johnswentworth25 Aug 2025 17:47 UTC
186 points
20 comments3 min readLW link