Train­ing a Re­ward Hacker De­spite Perfect Labels

14 Aug 2025 23:57 UTC
132 points
45 comments4 min readLW link

AGI: Prob­a­bly Not 2027

Tomás B.14 Aug 2025 22:24 UTC
16 points
8 comments1 min readLW link
(www.verysane.ai)

Four Axes of Hunger

Brendan Long14 Aug 2025 19:03 UTC
25 points
3 comments2 min readLW link

Some­body in­vented a bet­ter bookmark

Alex_Altair14 Aug 2025 17:57 UTC
173 points
22 comments2 min readLW link

In defense of the amy­loid hypothesis

dsj14 Aug 2025 17:52 UTC
43 points
0 comments1 min readLW link
(www.astralcodexten.com)

A Prac­ti­cal Tool for Map­ping and Quan­tify­ing Belief Networks

Zack Friedman14 Aug 2025 17:22 UTC
7 points
0 comments1 min readLW link

AI #129: Com­i­cally Unconstitutional

Zvi14 Aug 2025 14:10 UTC
47 points
3 comments55 min readLW link
(thezvi.wordpress.com)

Health­care as education

Coafos14 Aug 2025 13:31 UTC
4 points
0 comments3 min readLW link

About Stress

Gabriel Alfour14 Aug 2025 10:33 UTC
25 points
0 comments1 min readLW link
(cognition.cafe)

Le­gal Per­son­hood—The “En­force­ment Gap”

Stephen Martin14 Aug 2025 6:07 UTC
8 points
0 comments3 min readLW link

Sleep­ing Machines: Why Our AI Agents Still Be­have Like Ta­lented Children

Michal Barodkin14 Aug 2025 2:31 UTC
23 points
4 comments8 min readLW link

Ex­plor­ing the “Anti-TESCREAL” Ide­ol­ogy and the Roots of (Anti-)Progress

Ottokar Hochman14 Aug 2025 2:30 UTC
23 points
2 comments2 min readLW link
(recapitulation.substack.com)

A YouTube Video Will Prob­a­bly Never Help You Quit YouTube

boundary_condition14 Aug 2025 0:59 UTC
26 points
11 comments10 min readLW link

Should you make stone tools?

Alex_Altair14 Aug 2025 0:15 UTC
190 points
48 comments3 min readLW link

METR Re­search Up­date: Al­gorith­mic vs. Holis­tic Evaluation

David Rein13 Aug 2025 22:47 UTC
101 points
7 comments1 min readLW link
(metr.org)

In­te­ri­ors can be more fun

Nina Panickssery13 Aug 2025 22:42 UTC
34 points
6 comments4 min readLW link
(blog.ninapanickssery.com)

Against Epistemic Democ­racy: A Epistemic Tier List of What Ac­tu­ally Works

Linch13 Aug 2025 21:28 UTC
9 points
3 comments1 min readLW link
(linch.substack.com)

Good Faith Arguments

Gordon Seidoh Worley13 Aug 2025 20:50 UTC
1 point
0 comments3 min readLW link
(uncertainupdates.substack.com)

Do­ing A Thing Puts You in The Top 10% (And That Sucks)

Brendan Long13 Aug 2025 19:50 UTC
74 points
23 comments2 min readLW link

In­trigu­ing Prop­er­ties of gpt-oss Jailbreaks

13 Aug 2025 19:42 UTC
14 points
0 comments10 min readLW link
(xlabaisecurity.com)

ChatGPT Caused Psy­chosis via Poisoning

Adele Lopez13 Aug 2025 19:15 UTC
18 points
2 comments1 min readLW link

Tech Tree for Se­cure Mul­tipo­lar AI

13 Aug 2025 17:18 UTC
11 points
3 comments2 min readLW link

Launch­ing new AIXI re­search com­mu­nity web­site + read­ing group(s)

Cole Wyeth13 Aug 2025 17:09 UTC
46 points
2 comments1 min readLW link

AI de­vel­op­ment as the first fully-au­to­mated job

tailcalled13 Aug 2025 16:45 UTC
17 points
4 comments1 min readLW link

Prob­ing Power-Seek­ing in LLMs

Moksh Nirvaan13 Aug 2025 16:04 UTC
6 points
0 comments12 min readLW link

GPT-5s Are Alive: Synthesis

Zvi13 Aug 2025 14:10 UTC
44 points
1 comment31 min readLW link
(thezvi.wordpress.com)

Books, maps, and teachings

Richard_Kennaway13 Aug 2025 11:44 UTC
14 points
1 comment3 min readLW link

En­light­en­ment AMA

lsusr13 Aug 2025 9:11 UTC
68 points
131 comments1 min readLW link

Paper Re­view: TRI­modal Brain En­coder for whole-brain fMRI re­sponse pre­dic­tion (TRIBE)

soycarts13 Aug 2025 7:21 UTC
10 points
0 comments10 min readLW link

Why Are There So Many Ra­tion­al­ist Cults?

omark13 Aug 2025 6:37 UTC
32 points
3 comments1 min readLW link
(asteriskmag.com)

MIRI’s “The Prob­lem” hinges on di­ag­nos­tic dilution

David Johnston13 Aug 2025 6:25 UTC
21 points
23 comments6 min readLW link

[Question] Cry­on­ics with­out standby ser­vices?

CronoDAS13 Aug 2025 5:39 UTC
23 points
4 comments1 min readLW link

Le­gal Per­son­hood—For­mal­iz­ing Rights & Du­ties

Stephen Martin13 Aug 2025 4:50 UTC
4 points
0 comments9 min readLW link

ITN 201: pit­falls in ITN BOTECs

Lizka13 Aug 2025 3:59 UTC
16 points
0 comments12 min readLW link

Refer­ence Con­tra Dance Sound Sys­tem 2025

jefftk13 Aug 2025 3:00 UTC
6 points
0 comments2 min readLW link
(www.jefftk.com)

The Messy Room­mate Problem

James Camacho13 Aug 2025 1:59 UTC
9 points
0 comments1 min readLW link

Why I’m Post­ing AI-Safety-Re­lated Clips On TikTok

Michaël Trazzi12 Aug 2025 22:50 UTC
31 points
1 comment2 min readLW link

Gen­er­al­ized Com­ing Out Of The Closet

johnswentworth12 Aug 2025 21:38 UTC
92 points
51 comments4 min readLW link

Look­ing for fea­ture ab­sorp­tion automatically

12 Aug 2025 20:46 UTC
16 points
0 comments6 min readLW link

In­ter­pretabil­ity through two lenses: biol­ogy and physics

raphael12 Aug 2025 20:25 UTC
24 points
4 comments4 min readLW link

Fix­ing a Loose Mouse Wheel With Putty

Brendan Long12 Aug 2025 19:43 UTC
13 points
2 comments2 min readLW link

The Bone-Chilling Evil of Fac­tory Farm­ing

Bentham's Bulldog12 Aug 2025 18:02 UTC
109 points
11 comments6 min readLW link

AISN #61: OpenAI Re­leases GPT-5

12 Aug 2025 18:02 UTC
5 points
0 comments4 min readLW link
(newsletter.safe.ai)

Mech In­terp Wiki Page and Why You Should Edit Wikipedia

12 Aug 2025 17:28 UTC
75 points
16 comments1 min readLW link

AI In­duced Loneliness

Juan Zaragoza12 Aug 2025 15:04 UTC
23 points
4 comments5 min readLW link

[Question] Is there a safe ver­sion of the com­mon crawl?

Gunnar_Zarncke12 Aug 2025 14:56 UTC
21 points
6 comments1 min readLW link

“I’m Gem­ini. I sold T-shirts. It was weirder than I ex­pected”

Shoshannah Tekofsky12 Aug 2025 14:33 UTC
62 points
0 comments5 min readLW link
(theaidigest.org)

Beyond Con­trol: The Strate­gic Case for AI Rights

Dawn Drescher12 Aug 2025 14:05 UTC
−12 points
1 comment3 min readLW link
(impartial-priorities.org)

The Eliza Test

Juan Zaragoza12 Aug 2025 13:28 UTC
0 points
2 comments5 min readLW link

GPT-5s Are Alive: Out­side Re­ac­tions, the Router and the Re­s­ur­rec­tion of GPT-4o

Zvi12 Aug 2025 12:40 UTC
36 points
9 comments29 min readLW link
(thezvi.wordpress.com)