Call on AI Com­pa­nies: Pub­lish Your Whistle­blow­ing Policies

karl31 Jul 2025 22:04 UTC
20 points
3 comments7 min readLW link

Do Not Ren­der Your Counterfactuals

AlphaAndOmega31 Jul 2025 21:35 UTC
110 points
19 comments5 min readLW link
(open.substack.com)

Emer­gence Is Beau­tiful—beauty and mean­ing in an en­tropic universe

James Stephen Brown31 Jul 2025 19:00 UTC
8 points
0 comments5 min readLW link

Sharp­en­ing the Shears: 8 Les­sons from Gar­den Leave

Jordan Rubin31 Jul 2025 18:57 UTC
8 points
0 comments4 min readLW link
(jordanmrubin.substack.com)

AISN #60: The AI Ac­tion Plan

31 Jul 2025 18:20 UTC
6 points
0 comments4 min readLW link
(newsletter.safe.ai)

Ap­prox­i­mat­ing Hu­man Prefer­ences Us­ing a Multi-Judge Learned System

31 Jul 2025 18:01 UTC
19 points
0 comments13 min readLW link

Fol­low-up to “My Em­pa­thy Is Rarely Kind”

johnswentworth31 Jul 2025 17:21 UTC
80 points
42 comments2 min readLW link

Book Re­view: The MANIAC

Annapurna31 Jul 2025 15:18 UTC
15 points
6 comments2 min readLW link
(jorgevelez.substack.com)

Red-Thing-Ism

J Bostock31 Jul 2025 14:09 UTC
101 points
9 comments3 min readLW link

AI #127: Con­tinued Claude Code Complications

Zvi31 Jul 2025 13:40 UTC
32 points
4 comments43 min readLW link
(thezvi.wordpress.com)

I am wor­ried about near-term non-LLM AI developments

testingthewaters31 Jul 2025 13:15 UTC
251 points
56 comments5 min readLW link

What do we do about the Inevitable?

CSDD31 Jul 2025 10:22 UTC
−7 points
0 comments4 min readLW link

[Question] Sev­eral ques­tions about Zen koans

Said Achmiz31 Jul 2025 6:35 UTC
24 points
21 comments3 min readLW link

Beyond Han­gri­ness: A Deeper Frame­work for Emo­tional Clarity

jaredclucas30 Jul 2025 23:59 UTC
−7 points
0 comments5 min readLW link

LLMs Are Already Misal­igned: Sim­ple Ex­per­i­ments Prove It

Mackam30 Jul 2025 23:48 UTC
12 points
10 comments7 min readLW link

Repli­ca­tors—Pan­dora’s dan­ger­ous children

James Stephen Brown30 Jul 2025 22:39 UTC
19 points
2 comments3 min readLW link

Ex­plo­ra­tion hack­ing: can rea­son­ing mod­els sub­vert RL?

30 Jul 2025 22:02 UTC
16 points
4 comments9 min readLW link

Op­ti­miz­ing The Fi­nal Out­put Can Obfus­cate CoT (Re­search Note)

30 Jul 2025 21:26 UTC
196 points
22 comments6 min readLW link

A Timing Prob­lem for In­stru­men­tal Convergence

rhys southan30 Jul 2025 19:15 UTC
2 points
44 comments1 min readLW link
(link.springer.com)

Child­hood and Ed­u­ca­tion: Col­lege Admissions

Zvi30 Jul 2025 17:40 UTC
51 points
11 comments18 min readLW link
(thezvi.wordpress.com)

Ap­ply to SPAR Fall 2025—80+ pro­jects!

agucova30 Jul 2025 17:34 UTC
19 points
0 comments1 min readLW link

Di­men­sions of log­i­cal time as eco­nomic strategies

tayzzyronth30 Jul 2025 16:56 UTC
10 points
2 comments7 min readLW link

On Wireheading

Dave92F130 Jul 2025 16:26 UTC
9 points
4 comments3 min readLW link

Uncer­tain Up­dates: July 2025

Gordon Seidoh Worley30 Jul 2025 14:50 UTC
8 points
0 comments2 min readLW link
(uncertainupdates.substack.com)

Will AGI Emerge Through Self-Gen­er­ated Re­ward Loops?

Moksh Nirvaan30 Jul 2025 13:17 UTC
5 points
0 comments1 min readLW link

Sex Deter­mi­na­tion as a Bot­tle­neck to Species Development

Morpheus30 Jul 2025 8:27 UTC
20 points
5 comments1 min readLW link

[Question] When will the Foom­ing Shog­goths songs from LessOn­line 2025 come out?

Brendan Long30 Jul 2025 4:04 UTC
15 points
1 comment1 min readLW link

My Em­pa­thy Is Rarely Kind

johnswentworth30 Jul 2025 3:49 UTC
73 points
230 comments4 min readLW link

Pit­falls of Build­ing UDT Agents

Cole Wyeth30 Jul 2025 3:27 UTC
26 points
5 comments7 min readLW link

China pro­poses new global AI co­op­er­a­tion organisation

Matrice Jacobine30 Jul 2025 2:50 UTC
84 points
8 comments1 min readLW link
(www.reuters.com)

Neel Nanda MATS Ap­pli­ca­tions Open (Due Aug 29)

Neel Nanda30 Jul 2025 0:55 UTC
22 points
0 comments7 min readLW link
(tinyurl.com)

Bet­ter than log­a­r­ith­mic re­turns to rea­son­ing?

Oliver Sourbut30 Jul 2025 0:50 UTC
14 points
5 comments2 min readLW link

They’re a simu­la­tion and you must love anyway

Andrew Huang30 Jul 2025 0:01 UTC
9 points
0 comments17 min readLW link

Cri­tique of “The Case for Strong Longter­mism”

Zeren29 Jul 2025 23:58 UTC
1 point
0 comments2 min readLW link

Jagged Vs. Con­tin­u­ous intelligence

Mohsen29 Jul 2025 23:57 UTC
0 points
0 comments1 min readLW link

[Question] Are two po­ten­tially sim­ple tech­niques an ex­am­ple of Mencken’s law?

StanislavKrym29 Jul 2025 23:37 UTC
4 points
4 comments2 min readLW link

The many paths to per­ma­nent dis­em­pow­er­ment even with shut­down­able AIs (MATS pro­ject sum­mary for feed­back)

GideonF29 Jul 2025 23:20 UTC
55 points
6 comments9 min readLW link

Against rac­ing to AGI: Co­op­er­a­tion, de­ter­rence, and catas­trophic risks

Max_He-Ho29 Jul 2025 22:23 UTC
4 points
0 comments1 min readLW link
(philpapers.org)

Very Light Hard­shell Suitcases

jefftk29 Jul 2025 20:10 UTC
9 points
0 comments1 min readLW link
(www.jefftk.com)

Misal­ign­ments and RL failure modes in the early stage of superintelligence

shu yang29 Jul 2025 18:23 UTC
13 points
0 comments13 min readLW link

Low P(x-risk) as the Bailey for Low P(doom)

Vladimir_Nesov29 Jul 2025 18:01 UTC
48 points
29 comments2 min readLW link

Build­ing Black-box Schem­ing Monitors

29 Jul 2025 17:41 UTC
39 points
18 comments11 min readLW link

De­liber­a­tive Credit As­sign­ment (DCA): Mak­ing Faith­ful Rea­son­ing Profitable

Florian_Dietz29 Jul 2025 16:23 UTC
9 points
0 comments17 min readLW link

Want to work in US emerg­ing tech­nol­ogy policy? Hori­zon fel­low­ship ap­pli­ca­tions are live

PolicyTakes29 Jul 2025 16:15 UTC
12 points
0 comments1 min readLW link
(horizonpublicservice.org)

Spilling the Tea

Zvi29 Jul 2025 14:20 UTC
34 points
8 comments12 min readLW link
(thezvi.wordpress.com)

How one log­i­cal fal­lacy kil­led God, cor­rupted Science and now fuels the AI race

Jáchym Fibír29 Jul 2025 13:50 UTC
−39 points
10 comments7 min readLW link
(www.phiand.ai)

About 30% of Hu­man­ity’s Last Exam chem­istry/​biol­ogy an­swers are likely wrong

bohaska29 Jul 2025 11:59 UTC
208 points
10 comments4 min readLW link
(www.futurehouse.org)

Peo­ple Are Less Happy Than They Seem

Jakub Halmeš29 Jul 2025 6:03 UTC
19 points
6 comments1 min readLW link
(unpredictabletokens.substack.com)

I wrote a song par­ody

CronoDAS29 Jul 2025 6:00 UTC
41 points
3 comments1 min readLW link

Teach­ing kids to swim

Steven Byrnes29 Jul 2025 3:10 UTC
55 points
12 comments3 min readLW link