METR Re­search Up­date: Al­gorith­mic vs. Holis­tic Evaluation

David Rein13 Aug 2025 22:47 UTC
101 points
7 comments1 min readLW link
(metr.org)

In­te­ri­ors can be more fun

Nina Panickssery13 Aug 2025 22:42 UTC
34 points
6 comments4 min readLW link
(blog.ninapanickssery.com)

Against Epistemic Democ­racy: A Epistemic Tier List of What Ac­tu­ally Works

Linch13 Aug 2025 21:28 UTC
9 points
3 comments1 min readLW link
(linch.substack.com)

Good Faith Arguments

Gordon Seidoh Worley13 Aug 2025 20:50 UTC
1 point
0 comments3 min readLW link
(uncertainupdates.substack.com)

Do­ing A Thing Puts You in The Top 10% (And That Sucks)

Brendan Long13 Aug 2025 19:50 UTC
74 points
23 comments2 min readLW link

In­trigu­ing Prop­er­ties of gpt-oss Jailbreaks

13 Aug 2025 19:42 UTC
14 points
0 comments10 min readLW link
(xlabaisecurity.com)

ChatGPT Caused Psy­chosis via Poisoning

Adele Lopez13 Aug 2025 19:15 UTC
18 points
2 comments1 min readLW link

Tech Tree for Se­cure Mul­tipo­lar AI

13 Aug 2025 17:18 UTC
11 points
3 comments2 min readLW link

Launch­ing new AIXI re­search com­mu­nity web­site + read­ing group(s)

Cole Wyeth13 Aug 2025 17:09 UTC
46 points
2 comments1 min readLW link

AI de­vel­op­ment as the first fully-au­to­mated job

tailcalled13 Aug 2025 16:45 UTC
17 points
4 comments1 min readLW link

Prob­ing Power-Seek­ing in LLMs

Moksh Nirvaan13 Aug 2025 16:04 UTC
6 points
0 comments12 min readLW link

GPT-5s Are Alive: Synthesis

Zvi13 Aug 2025 14:10 UTC
44 points
1 comment31 min readLW link
(thezvi.wordpress.com)

Books, maps, and teachings

Richard_Kennaway13 Aug 2025 11:44 UTC
14 points
1 comment3 min readLW link

En­light­en­ment AMA

lsusr13 Aug 2025 9:11 UTC
68 points
131 comments1 min readLW link

Paper Re­view: TRI­modal Brain En­coder for whole-brain fMRI re­sponse pre­dic­tion (TRIBE)

soycarts13 Aug 2025 7:21 UTC
10 points
0 comments10 min readLW link

Why Are There So Many Ra­tion­al­ist Cults?

omark13 Aug 2025 6:37 UTC
32 points
3 comments1 min readLW link
(asteriskmag.com)

MIRI’s “The Prob­lem” hinges on di­ag­nos­tic dilution

David Johnston13 Aug 2025 6:25 UTC
21 points
23 comments6 min readLW link

[Question] Cry­on­ics with­out standby ser­vices?

CronoDAS13 Aug 2025 5:39 UTC
23 points
4 comments1 min readLW link

Le­gal Per­son­hood—For­mal­iz­ing Rights & Du­ties

Stephen Martin13 Aug 2025 4:50 UTC
4 points
0 comments9 min readLW link

ITN 201: pit­falls in ITN BOTECs

Lizka13 Aug 2025 3:59 UTC
16 points
0 comments12 min readLW link

Refer­ence Con­tra Dance Sound Sys­tem 2025

jefftk13 Aug 2025 3:00 UTC
6 points
0 comments2 min readLW link
(www.jefftk.com)

The Messy Room­mate Problem

James Camacho13 Aug 2025 1:59 UTC
9 points
0 comments1 min readLW link

Why I’m Post­ing AI-Safety-Re­lated Clips On TikTok

Michaël Trazzi12 Aug 2025 22:50 UTC
31 points
1 comment2 min readLW link

Gen­er­al­ized Com­ing Out Of The Closet

johnswentworth12 Aug 2025 21:38 UTC
92 points
51 comments4 min readLW link

Look­ing for fea­ture ab­sorp­tion automatically

12 Aug 2025 20:46 UTC
16 points
0 comments6 min readLW link

In­ter­pretabil­ity through two lenses: biol­ogy and physics

raphael12 Aug 2025 20:25 UTC
24 points
4 comments4 min readLW link

Fix­ing a Loose Mouse Wheel With Putty

Brendan Long12 Aug 2025 19:43 UTC
13 points
2 comments2 min readLW link

The Bone-Chilling Evil of Fac­tory Farm­ing

Bentham's Bulldog12 Aug 2025 18:02 UTC
109 points
11 comments6 min readLW link

AISN #61: OpenAI Re­leases GPT-5

12 Aug 2025 18:02 UTC
5 points
0 comments4 min readLW link
(newsletter.safe.ai)

Mech In­terp Wiki Page and Why You Should Edit Wikipedia

12 Aug 2025 17:28 UTC
75 points
16 comments1 min readLW link

AI In­duced Loneliness

Juan Zaragoza12 Aug 2025 15:04 UTC
23 points
4 comments5 min readLW link

[Question] Is there a safe ver­sion of the com­mon crawl?

Gunnar_Zarncke12 Aug 2025 14:56 UTC
21 points
6 comments1 min readLW link

“I’m Gem­ini. I sold T-shirts. It was weirder than I ex­pected”

Shoshannah Tekofsky12 Aug 2025 14:33 UTC
62 points
0 comments5 min readLW link
(theaidigest.org)

Beyond Con­trol: The Strate­gic Case for AI Rights

Dawn Drescher12 Aug 2025 14:05 UTC
−12 points
1 comment3 min readLW link
(impartial-priorities.org)

The Eliza Test

Juan Zaragoza12 Aug 2025 13:28 UTC
0 points
2 comments5 min readLW link

GPT-5s Are Alive: Out­side Re­ac­tions, the Router and the Re­s­ur­rec­tion of GPT-4o

Zvi12 Aug 2025 12:40 UTC
36 points
9 comments29 min readLW link
(thezvi.wordpress.com)

Le­gal Per­son­hood—Prob­lems with the Concept

Stephen Martin12 Aug 2025 5:15 UTC
3 points
4 comments4 min readLW link

Two Types of (Hu­man) Uncertainty

Roman Malov12 Aug 2025 1:36 UTC
9 points
2 comments2 min readLW link

Thoughts on ex­trap­o­lat­ing time horizons

Nikola Jurkovic11 Aug 2025 22:36 UTC
53 points
7 comments1 min readLW link
(x.com)

CoT May Be Highly In­for­ma­tive De­spite “Un­faith­ful­ness” [METR]

GradientDissenter11 Aug 2025 21:47 UTC
64 points
3 comments24 min readLW link
(metr.org)

16 Con­crete, Am­bi­tious AI Pro­ject Pro­pos­als for Science and Security

Alejandro Acelas11 Aug 2025 20:33 UTC
13 points
0 comments1 min readLW link
(ifp.org)

How Does A Blind Model See The Earth?

henry11 Aug 2025 19:58 UTC
474 points
38 comments7 min readLW link
(outsidetext.substack.com)

How we spent our first two weeks as an in­de­pen­dent AI safety re­search group

11 Aug 2025 19:32 UTC
28 points
0 comments10 min readLW link

The Frus­tra­tions and Per­ils of Nav­i­gat­ing Blind to Rocks

jimmy11 Aug 2025 19:03 UTC
5 points
0 comments7 min readLW link

Nega­tive util­i­tar­i­anism is more in­tu­itive than you think

Nina Panickssery11 Aug 2025 16:13 UTC
13 points
25 comments3 min readLW link
(blog.ninapanickssery.com)

Dwarf Fortress and Claude’s ASCII Art Blindness

Brendan Long11 Aug 2025 16:05 UTC
16 points
1 comment3 min readLW link
(www.brendanlong.com)

Alter­na­tive Models of Superposition

11 Aug 2025 15:52 UTC
15 points
6 comments5 min readLW link

Am­bi­tion, Good and Bad: Green Grow­ing Things and Forgeworthiness

Evenstar11 Aug 2025 15:20 UTC
10 points
0 comments5 min readLW link

ARENA 5.0 Im­pact Report

11 Aug 2025 14:06 UTC
25 points
0 comments20 min readLW link

GPT-5s Are Alive: Ba­sic Facts, Bench­marks and the Model Card

Zvi11 Aug 2025 12:10 UTC
45 points
2 comments25 min readLW link
(thezvi.wordpress.com)