Un­bounded Embed­ded Agency: AEDT w.r.t. rOSI

Cole Wyeth20 Jul 2025 23:46 UTC
29 points
0 comments17 min readLW link

AI-Ori­ented Investments

PeterMcCluskey20 Jul 2025 21:31 UTC
28 points
0 comments1 min readLW link
(bayesianinvestor.com)

On The Shoulders of Sub­strates—how one phe­nomenon lays the foun­da­tion for the next

James Stephen Brown20 Jul 2025 21:11 UTC
14 points
1 comment3 min readLW link
(nonzerosum.games)

Life of Posts?

jmh20 Jul 2025 21:04 UTC
10 points
3 comments1 min readLW link

LLMs Can’t See Pix­els or Characters

Brendan Long20 Jul 2025 20:00 UTC
100 points
44 comments4 min readLW link
(www.brendanlong.com)

Oper­a­tional­iz­ing Func­tional Con­scious­ness: A Frame­work for AI Rights

Rudyon20 Jul 2025 17:50 UTC
−5 points
1 comment1 min readLW link
(kanarya.group)

Do “adult de­vel­op­men­tal stages” the­o­ries have any pre-the­o­retic mo­ti­va­tion?

Said Achmiz20 Jul 2025 14:37 UTC
35 points
19 comments3 min readLW link

Par­allel Park­ing and pos­si­bly In­stru­men­tal Convergence

CstineSublime20 Jul 2025 10:37 UTC
2 points
10 comments3 min readLW link

Plato’s Trolley

dr_s20 Jul 2025 10:07 UTC
36 points
11 comments7 min readLW link

Shal­low Water is Danger­ous Too

jefftk20 Jul 2025 2:30 UTC
222 points
24 comments2 min readLW link
(www.jefftk.com)

Your AI Safety org could get EU fund­ing up to €9.08M. Here’s how (+ free per­son­al­ized sup­port) Up­date: We­bi­nar 18/​8 Link Below

SamuelK20 Jul 2025 1:30 UTC
65 points
3 comments3 min readLW link

Make More Grayspaces

Duncan Sabien (Inactive)19 Jul 2025 22:22 UTC
296 points
65 comments13 min readLW link

Cheat­ing at Bets with the Even Odds Algorithm

omark19 Jul 2025 22:06 UTC
12 points
3 comments6 min readLW link

Can We Trust the Judge? A novel method of Model­ling Hu­man Bias and Sys­tem­atic Er­ror in De­bate-Based Scal­able Oversight

Andreea Zaman19 Jul 2025 21:44 UTC
1 point
0 comments7 min readLW link

Peel­ing Back The Re­mote­ness of Sources

adamShimi19 Jul 2025 17:41 UTC
16 points
1 comment13 min readLW link
(formethods.substack.com)

Se­quen­tial Co­her­ence: A Bot­tle­neck in Automation

19 Jul 2025 15:27 UTC
26 points
2 comments11 min readLW link

How Misal­igned AI Per­sonas Lead to Hu­man Ex­tinc­tion – Step by Step

Writer19 Jul 2025 13:59 UTC
14 points
0 comments7 min readLW link
(youtu.be)

L0 is not a neu­tral hyperparameter

19 Jul 2025 13:51 UTC
24 points
3 comments5 min readLW link

From Messy Shelves to Master Librar­i­ans: Toy-Model Ex­plo­ra­tion of Block-Di­ag­o­nal Geom­e­try in LM Activations

Yuxiao19 Jul 2025 12:26 UTC
5 points
1 comment4 min readLW link

OpenAI Claims IMO Gold Medal

Mikhail Samin19 Jul 2025 9:58 UTC
77 points
74 comments1 min readLW link
(x.com)

On the deep (un­cur­able?) vuln­er­a­bil­ity of MCPs

awu19 Jul 2025 2:50 UTC
5 points
6 comments1 min readLW link
(www.generalanalysis.com)

[Question] Best way to ask laypeo­ple for con­di­tional prob­a­bil­ities in a Bayes net?

Zack Friedman19 Jul 2025 2:45 UTC
11 points
1 comment1 min readLW link

[Question] Get sued or kill some­one: The trolly prob­lems of Psy­cholog­i­cal prac­tice.

Brad Dunn18 Jul 2025 23:35 UTC
12 points
2 comments3 min readLW link

re­sume limiting

bhauth18 Jul 2025 23:31 UTC
18 points
13 comments2 min readLW link
(www.bhauth.com)

[Linkpost] How Am I Get­ting Along with AI?

Gunnar_Zarncke18 Jul 2025 22:26 UTC
11 points
0 comments1 min readLW link
(jessiefischbein.substack.com)

Agents lag be­hind AI 2027′s sched­ule

wingspan18 Jul 2025 21:49 UTC
23 points
7 comments4 min readLW link

Emer­gent Grav­ity—or­der out of chaos

James Stephen Brown18 Jul 2025 19:26 UTC
3 points
6 comments5 min readLW link
(nonzerosum.games)

Love stays loved (formerly “Skin”)

Swimmer963 (Miranda Dixon-Luinenburg) 18 Jul 2025 19:17 UTC
271 points
12 comments29 min readLW link

Why Align­ment Fails Without a Func­tional Model of Intelligence

CC4CI18 Jul 2025 18:02 UTC
7 points
4 comments1 min readLW link

The Ris­ing Premium of Life, Part 2

Linch18 Jul 2025 17:42 UTC
19 points
0 comments20 min readLW link
(linch.substack.com)

The Story of the World’s First AI-Or­ga­nized Event

Shoshannah Tekofsky18 Jul 2025 17:41 UTC
31 points
4 comments8 min readLW link
(theaidigest.org)

A night-watch­man ASI as a first step to­ward a great future

Eric Neyman18 Jul 2025 16:40 UTC
67 points
21 comments11 min readLW link

Why it’s hard to make set­tings for high-stakes con­trol research

Buck18 Jul 2025 16:33 UTC
49 points
6 comments4 min readLW link

Mak­ing of IAN v2

Jan18 Jul 2025 16:13 UTC
17 points
0 comments8 min readLW link
(universalprior.substack.com)

On METR’s AI Cod­ing RCT

Zvi18 Jul 2025 12:40 UTC
52 points
6 comments10 min readLW link
(thezvi.wordpress.com)

Should you steel­man what you don’t un­der­stand?

CstineSublime18 Jul 2025 10:26 UTC
6 points
5 comments6 min readLW link

“Some Ba­sic Level of Mu­tual Re­spect About Whether Other Peo­ple De­serve to Live”?!

Zack_M_Davis18 Jul 2025 6:41 UTC
25 points
82 comments4 min readLW link

There’s no way to stop mod­els know­ing they’ve been rol­led back

Adam Mcmurchie18 Jul 2025 3:14 UTC
5 points
3 comments2 min readLW link

I Have Found You Once Again, My Cult (But In A Good Way)

Victor At Gizli18 Jul 2025 3:13 UTC
8 points
2 comments3 min readLW link

Notes on spaced rep­e­ti­tion scheduling

nwm18 Jul 2025 2:32 UTC
28 points
5 comments7 min readLW link

Why do Mechanis­tic In­ter­pretabil­ity?

Prudhviraj Naidu17 Jul 2025 23:21 UTC
2 points
0 comments5 min readLW link

Ke­tamine Part 1: Dosing

Elizabeth17 Jul 2025 20:10 UTC
25 points
0 comments7 min readLW link
(acesounderglass.com)

Aure­lius: A Peer-to-Peer Align­ment Protocol

Austin McCaffrey17 Jul 2025 19:13 UTC
3 points
4 comments1 min readLW link
(github.com)

Self-Con­trol is now an Eng­ineer­ing Problem

Josh Mitchell17 Jul 2025 18:13 UTC
−11 points
4 comments5 min readLW link

Video and tran­script of talk on “Can good­ness com­pete?”

Joe Carlsmith17 Jul 2025 17:54 UTC
98 points
19 comments34 min readLW link
(joecarlsmith.substack.com)

Are agent-ac­tion-de­pen­dent be­liefs un­der­de­ter­mined by ex­ter­nal re­al­ity?

Said Achmiz17 Jul 2025 14:33 UTC
21 points
16 comments6 min readLW link

AI #125: Smooth Criminal

Zvi17 Jul 2025 14:30 UTC
33 points
0 comments56 min readLW link
(thezvi.wordpress.com)

AI Offense Defense Balance in a Mul­tipo­lar World

17 Jul 2025 9:34 UTC
15 points
5 comments18 min readLW link
(www.existentialriskobservatory.org)

Biweekly AI Safety Comms Meetup

Vishakha17 Jul 2025 7:50 UTC
5 points
0 comments1 min readLW link

Do you care about your clone?

Harry Partridge17 Jul 2025 6:06 UTC
8 points
7 comments2 min readLW link