AI agents and painted facades

30 Aug 2025 23:13 UTC
38 points
3 comments2 min readLW link
(fulcrumresearch.ai)

ACX Every­where fall 2025 - New­ton, MA

duck_master30 Aug 2025 22:02 UTC
1 point
1 comment1 min readLW link

[via bsky, found pa­per] “AI Con­scious­ness: A Cen­trist Man­i­festo”

the gears to ascension30 Aug 2025 21:05 UTC
13 points
0 comments1 min readLW link
(philpapers.org)

Fe­male sex­ual at­trac­tive­ness seems more egal­i­tar­ian than peo­ple acknowledge

lc30 Aug 2025 18:09 UTC
53 points
27 comments3 min readLW link

AI Sleeper Agents: How An­thropic Trains and Catches Them—Video

Writer30 Aug 2025 17:53 UTC
9 points
0 comments7 min readLW link
(youtu.be)

Un­der­stand­ing LLMs: In­sights from Mechanis­tic Interpretability

Stephen McAleese30 Aug 2025 16:50 UTC
40 points
2 comments30 min readLW link

Le­gal Per­son­hood—The First Amend­ment (Part 1)

Stephen Martin30 Aug 2025 13:20 UTC
4 points
0 comments3 min readLW link

Method Iter­a­tion: An LLM Prompt­ing Technique

Davey Morse30 Aug 2025 0:08 UTC
−12 points
1 comment2 min readLW link

[Question] How to bet on my­self? From ex­pec­ta­tions to ro­bust goals

P. João29 Aug 2025 18:33 UTC
4 points
3 comments1 min readLW link

AI Se­cu­rity Lon­don Hackathon

Prince Kumar29 Aug 2025 18:23 UTC
4 points
0 comments1 min readLW link

Sum­mary of our Work­shop on Post-AGI Outcomes

29 Aug 2025 17:14 UTC
96 points
3 comments3 min readLW link

Wikipe­dia, but writ­ten by AIs

Viliam29 Aug 2025 16:37 UTC
32 points
9 comments4 min readLW link

60 U.K. Law­mak­ers Ac­cuse Google of Break­ing AI Safety Pledge

Joseph Miller29 Aug 2025 16:09 UTC
50 points
1 comment1 min readLW link
(time.com)

AI #131 Part 2: Var­i­ous Misal­igned Things

Zvi29 Aug 2025 15:00 UTC
34 points
7 comments41 min readLW link
(thezvi.wordpress.com)

The Gabian His­tory of Mathematics

29 Aug 2025 13:48 UTC
21 points
9 comments2 min readLW link
(cognition.cafe)

Qual­ified rights for AI agents

Gauraventh29 Aug 2025 12:42 UTC
4 points
1 comment5 min readLW link
(robertandgaurav.substack.com)

I am try­ing to write the his­tory of tran­shu­man­ism-re­lated communities

Ihor Kendiukhov29 Aug 2025 11:37 UTC
7 points
4 comments1 min readLW link

Claude Plays… What­ever it Wants

Adam B29 Aug 2025 10:57 UTC
37 points
4 comments7 min readLW link

Not step­ping on bugs

Gauraventh29 Aug 2025 10:08 UTC
1 point
6 comments2 min readLW link
(y1d2.com)

Defen­sive­ness does not equal guilt

Kaj_Sotala29 Aug 2025 6:14 UTC
60 points
16 comments3 min readLW link

Truth

Kabir Kumar28 Aug 2025 20:53 UTC
6 points
0 comments2 min readLW link
(kkumar97.blogspot.com)

Here’s 18 Ap­pli­ca­tions of De­cep­tion Probes

28 Aug 2025 18:59 UTC
38 points
0 comments22 min readLW link

LW@Dragoncon Meetup

Error28 Aug 2025 18:40 UTC
7 points
0 comments1 min readLW link

If we can ed­u­cate AIs, why not ap­ply that ed­u­ca­tion to peo­ple? - A Si­mu­la­tion with Claude

P. João28 Aug 2025 16:37 UTC
3 points
0 comments7 min readLW link

AI #131 Part 1: Gem­ini 2.5 Flash Image is Cool

Zvi28 Aug 2025 16:20 UTC
39 points
4 comments30 min readLW link
(thezvi.wordpress.com)

Von Neu­mann’s Fal­lacy and You

incident-recipient28 Aug 2025 15:52 UTC
98 points
29 comments4 min readLW link

AI mis­be­havi­our in the wild from An­don Labs’ Safety Report

Lukas Petersson28 Aug 2025 15:10 UTC
39 points
0 comments1 min readLW link
(andonlabs.com)

The Other Align­ment Prob­lems: How epistemic, moral and aes­thetic norms get entangled

James Diacoumis28 Aug 2025 11:26 UTC
3 points
0 comments5 min readLW link

We should think about the pivotal act again. Here’s a bet­ter ver­sion of it.

otto.barten28 Aug 2025 9:29 UTC
11 points
2 comments3 min readLW link

Elab­o­ra­tive reading

DirectedEvolution28 Aug 2025 8:55 UTC
20 points
0 comments9 min readLW link

Pro­fan­ity causes emer­gent mis­al­ign­ment, but with qual­i­ta­tively differ­ent re­sults than in­se­cure code

megasilverfist28 Aug 2025 8:22 UTC
21 points
2 comments8 min readLW link

Us­ing Psy­chol­in­guis­tic Sig­nals to Im­prove AI Safety

Jkreindler27 Aug 2025 22:30 UTC
−2 points
0 comments4 min readLW link

Tran­si­tion and So­cial Dy­nam­ics of a post-co­or­di­na­tion world

Lessbroken27 Aug 2025 22:23 UTC
1 point
0 comments7 min readLW link

Tech­ni­cal AI Safety re­search tax­on­omy at­tempt (2025)

Benjamin Plaut27 Aug 2025 22:17 UTC
2 points
0 comments2 min readLW link

The Fu­ture of AI Agents

kavya27 Aug 2025 21:58 UTC
6 points
8 comments5 min readLW link

Against “Model Welfare” in 2025

Haley Moller27 Aug 2025 21:56 UTC
−10 points
8 comments4 min readLW link

Are They Start­ing To Take Our Jobs?

Zvi27 Aug 2025 18:50 UTC
44 points
6 comments5 min readLW link
(thezvi.wordpress.com)

Will Any Crap Cause Emer­gent Misal­ign­ment?

J Bostock27 Aug 2025 18:20 UTC
192 points
37 comments3 min readLW link

Open Global In­vest­ment as a Gover­nance Model for AGI

Nick Bostrom27 Aug 2025 17:42 UTC
152 points
47 comments39 min readLW link
(nickbostrom.com)

Uncer­tain Up­dates Au­gust 2025

Gordon Seidoh Worley27 Aug 2025 17:31 UTC
11 points
1 comment2 min readLW link
(uncertainupdates.substack.com)

At­tach­ing re­quire­ments to model re­leases has se­ri­ous down­sides (rel­a­tive to a differ­ent dead­line for these re­quire­ments)

ryan_greenblatt27 Aug 2025 17:04 UTC
99 points
2 comments3 min readLW link

[An­thropic] A hacker used Claude Code to au­to­mate ransomware

bohaska27 Aug 2025 14:57 UTC
86 points
25 comments3 min readLW link
(www.anthropic.com)

AI com­pa­nies have started say­ing safe­guards are load-bearing

Zach Stein-Perlman27 Aug 2025 13:00 UTC
52 points
2 comments5 min readLW link

Would you sell your soul to save it? ( I am NOT a Chris­tian)

AdamLacerdo27 Aug 2025 11:05 UTC
−21 points
8 comments4 min readLW link

Le­gal Per­son­hood—The Fifth Amend­ment (Part 2)

Stephen Martin27 Aug 2025 9:03 UTC
5 points
2 comments4 min readLW link

Con­tra Yud­kowsky’s Ideal Bayesian

vae27 Aug 2025 5:43 UTC
51 points
17 comments13 min readLW link

Cal­ibrat­ing an Ul­tra­sonic Hu­mid­ifier for Gly­col Vapors

jefftk27 Aug 2025 1:40 UTC
11 points
2 comments1 min readLW link
(www.jefftk.com)

Mis­gen­er­al­iza­tion of Fic­tional Train­ing Data as a Con­trib­u­tor to Misalignment

Mark Keavney27 Aug 2025 1:01 UTC
9 points
1 comment2 min readLW link

[Question] How are you ap­proach­ing cog­ni­tive se­cu­rity as AI be­comes more ca­pa­ble?

james oofou26 Aug 2025 20:52 UTC
11 points
1 comment1 min readLW link

AI In­duced Psy­chosis: A shal­low investigation

Tim Hua26 Aug 2025 20:03 UTC
359 points
43 comments26 min readLW link