Make More Grayspaces

Duncan Sabien (Inactive)19 Jul 2025 22:22 UTC
296 points
65 comments13 min readLW link

Cheat­ing at Bets with the Even Odds Algorithm

omark19 Jul 2025 22:06 UTC
12 points
3 comments6 min readLW link

Can We Trust the Judge? A novel method of Model­ling Hu­man Bias and Sys­tem­atic Er­ror in De­bate-Based Scal­able Oversight

Andreea Zaman19 Jul 2025 21:44 UTC
1 point
0 comments7 min readLW link

Peel­ing Back The Re­mote­ness of Sources

adamShimi19 Jul 2025 17:41 UTC
16 points
1 comment13 min readLW link
(formethods.substack.com)

Se­quen­tial Co­her­ence: A Bot­tle­neck in Automation

19 Jul 2025 15:27 UTC
26 points
2 comments11 min readLW link

How Misal­igned AI Per­sonas Lead to Hu­man Ex­tinc­tion – Step by Step

Writer19 Jul 2025 13:59 UTC
14 points
0 comments7 min readLW link
(youtu.be)

L0 is not a neu­tral hyperparameter

19 Jul 2025 13:51 UTC
24 points
3 comments5 min readLW link

From Messy Shelves to Master Librar­i­ans: Toy-Model Ex­plo­ra­tion of Block-Di­ag­o­nal Geom­e­try in LM Activations

Yuxiao19 Jul 2025 12:26 UTC
5 points
1 comment4 min readLW link

OpenAI Claims IMO Gold Medal

Mikhail Samin19 Jul 2025 9:58 UTC
77 points
74 comments1 min readLW link
(x.com)

On the deep (un­cur­able?) vuln­er­a­bil­ity of MCPs

awu19 Jul 2025 2:50 UTC
5 points
6 comments1 min readLW link
(www.generalanalysis.com)

[Question] Best way to ask laypeo­ple for con­di­tional prob­a­bil­ities in a Bayes net?

Zack Friedman19 Jul 2025 2:45 UTC
11 points
1 comment1 min readLW link

[Question] Get sued or kill some­one: The trolly prob­lems of Psy­cholog­i­cal prac­tice.

Brad Dunn18 Jul 2025 23:35 UTC
12 points
2 comments3 min readLW link

re­sume limiting

bhauth18 Jul 2025 23:31 UTC
18 points
13 comments2 min readLW link
(www.bhauth.com)

[Linkpost] How Am I Get­ting Along with AI?

Gunnar_Zarncke18 Jul 2025 22:26 UTC
11 points
0 comments1 min readLW link
(jessiefischbein.substack.com)

Agents lag be­hind AI 2027′s sched­ule

wingspan18 Jul 2025 21:49 UTC
23 points
7 comments4 min readLW link

Emer­gent Grav­ity—or­der out of chaos

James Stephen Brown18 Jul 2025 19:26 UTC
3 points
6 comments5 min readLW link
(nonzerosum.games)

Love stays loved (formerly “Skin”)

Swimmer963 (Miranda Dixon-Luinenburg) 18 Jul 2025 19:17 UTC
271 points
12 comments29 min readLW link

Why Align­ment Fails Without a Func­tional Model of Intelligence

CC4CI18 Jul 2025 18:02 UTC
7 points
4 comments1 min readLW link

The Ris­ing Premium of Life, Part 2

Linch18 Jul 2025 17:42 UTC
19 points
0 comments20 min readLW link
(linch.substack.com)

The Story of the World’s First AI-Or­ga­nized Event

Shoshannah Tekofsky18 Jul 2025 17:41 UTC
31 points
4 comments8 min readLW link
(theaidigest.org)

A night-watch­man ASI as a first step to­ward a great future

Eric Neyman18 Jul 2025 16:40 UTC
67 points
21 comments11 min readLW link

Why it’s hard to make set­tings for high-stakes con­trol research

Buck18 Jul 2025 16:33 UTC
49 points
6 comments4 min readLW link

Mak­ing of IAN v2

Jan18 Jul 2025 16:13 UTC
17 points
0 comments8 min readLW link
(universalprior.substack.com)

On METR’s AI Cod­ing RCT

Zvi18 Jul 2025 12:40 UTC
52 points
6 comments10 min readLW link
(thezvi.wordpress.com)

Should you steel­man what you don’t un­der­stand?

CstineSublime18 Jul 2025 10:26 UTC
6 points
5 comments6 min readLW link

“Some Ba­sic Level of Mu­tual Re­spect About Whether Other Peo­ple De­serve to Live”?!

Zack_M_Davis18 Jul 2025 6:41 UTC
25 points
82 comments4 min readLW link

There’s no way to stop mod­els know­ing they’ve been rol­led back

Adam Mcmurchie18 Jul 2025 3:14 UTC
5 points
3 comments2 min readLW link

I Have Found You Once Again, My Cult (But In A Good Way)

Victor At Gizli18 Jul 2025 3:13 UTC
8 points
2 comments3 min readLW link

Notes on spaced rep­e­ti­tion scheduling

nwm18 Jul 2025 2:32 UTC
28 points
5 comments7 min readLW link

Why do Mechanis­tic In­ter­pretabil­ity?

Prudhviraj Naidu17 Jul 2025 23:21 UTC
2 points
0 comments5 min readLW link

Ke­tamine Part 1: Dosing

Elizabeth17 Jul 2025 20:10 UTC
25 points
0 comments7 min readLW link
(acesounderglass.com)

Aure­lius: A Peer-to-Peer Align­ment Protocol

Austin McCaffrey17 Jul 2025 19:13 UTC
3 points
4 comments1 min readLW link
(github.com)

Self-Con­trol is now an Eng­ineer­ing Problem

Josh Mitchell17 Jul 2025 18:13 UTC
−11 points
4 comments5 min readLW link

Video and tran­script of talk on “Can good­ness com­pete?”

Joe Carlsmith17 Jul 2025 17:54 UTC
98 points
19 comments34 min readLW link
(joecarlsmith.substack.com)

Are agent-ac­tion-de­pen­dent be­liefs un­der­de­ter­mined by ex­ter­nal re­al­ity?

Said Achmiz17 Jul 2025 14:33 UTC
21 points
16 comments6 min readLW link

AI #125: Smooth Criminal

Zvi17 Jul 2025 14:30 UTC
33 points
0 comments56 min readLW link
(thezvi.wordpress.com)

AI Offense Defense Balance in a Mul­tipo­lar World

17 Jul 2025 9:34 UTC
15 points
5 comments18 min readLW link
(www.existentialriskobservatory.org)

Biweekly AI Safety Comms Meetup

Vishakha17 Jul 2025 7:50 UTC
5 points
0 comments1 min readLW link

Do you care about your clone?

Harry Partridge17 Jul 2025 6:06 UTC
8 points
7 comments2 min readLW link

Com­ment on “Four Lay­ers of In­tel­lec­tual Con­ver­sa­tion”

Zack_M_Davis17 Jul 2025 3:53 UTC
64 points
11 comments5 min readLW link

Towards plau­si­ble moral naturalism

jessicata17 Jul 2025 1:51 UTC
17 points
9 comments9 min readLW link
(unstableontology.com)

As­sign Prob­a­bil­ities Functorially

kaleb17 Jul 2025 1:49 UTC
8 points
6 comments9 min readLW link

Try­ing the Ob­vi­ous Thing

16 Jul 2025 22:24 UTC
35 points
2 comments3 min readLW link
(cognition.cafe)

Emer­gence vs En­tropy—a uni­ver­sal paradox

James Stephen Brown16 Jul 2025 21:31 UTC
4 points
0 comments4 min readLW link

Selec­tive Gen­er­al­iza­tion: Im­prov­ing Ca­pa­bil­ities While Main­tain­ing Alignment

16 Jul 2025 21:25 UTC
66 points
4 comments7 min readLW link

Body­dou­ble /​ Think­ing As­sis­tant matchmaking

Raemon16 Jul 2025 19:54 UTC
51 points
10 comments2 min readLW link

Zero sum ex­pec­ta­tions as an ex­pla­na­tion of om­ni­cide-indifference

asasz16 Jul 2025 19:25 UTC
2 points
6 comments2 min readLW link

On the ge­o­met­ri­cal Na­ture of Insight

Giuseppe Birardi16 Jul 2025 19:12 UTC
3 points
0 comments41 min readLW link

Van­cou­ver Ra­tion­al­ists/​Tran­shu­man­ists/​Fu­tur­ists Beach Meetup

apocalypticc16 Jul 2025 19:09 UTC
2 points
0 comments1 min readLW link

What is the prob­a­bil­ity that fu­ture AI de­vel­op­ment will be se­ri­ously de­layed or ended due to en­ergy de­cline ?

AdamLacerdo16 Jul 2025 19:08 UTC
−1 points
12 comments1 min readLW link