Why I’m Post­ing AI-Safety-Re­lated Clips On TikTok

Michaël Trazzi12 Aug 2025 22:50 UTC
31 points
1 comment2 min readLW link

Gen­er­al­ized Com­ing Out Of The Closet

johnswentworth12 Aug 2025 21:38 UTC
92 points
51 comments4 min readLW link

Look­ing for fea­ture ab­sorp­tion automatically

12 Aug 2025 20:46 UTC
16 points
0 comments6 min readLW link

In­ter­pretabil­ity through two lenses: biol­ogy and physics

raphael12 Aug 2025 20:25 UTC
24 points
4 comments4 min readLW link

Fix­ing a Loose Mouse Wheel With Putty

Brendan Long12 Aug 2025 19:43 UTC
13 points
2 comments2 min readLW link

The Bone-Chilling Evil of Fac­tory Farm­ing

Bentham's Bulldog12 Aug 2025 18:02 UTC
109 points
11 comments6 min readLW link

AISN #61: OpenAI Re­leases GPT-5

12 Aug 2025 18:02 UTC
5 points
0 comments4 min readLW link
(newsletter.safe.ai)

Mech In­terp Wiki Page and Why You Should Edit Wikipedia

12 Aug 2025 17:28 UTC
75 points
16 comments1 min readLW link

AI In­duced Loneliness

Juan Zaragoza12 Aug 2025 15:04 UTC
23 points
4 comments5 min readLW link

[Question] Is there a safe ver­sion of the com­mon crawl?

Gunnar_Zarncke12 Aug 2025 14:56 UTC
21 points
6 comments1 min readLW link

“I’m Gem­ini. I sold T-shirts. It was weirder than I ex­pected”

Shoshannah Tekofsky12 Aug 2025 14:33 UTC
62 points
0 comments5 min readLW link
(theaidigest.org)

Beyond Con­trol: The Strate­gic Case for AI Rights

Dawn Drescher12 Aug 2025 14:05 UTC
−12 points
1 comment3 min readLW link
(impartial-priorities.org)

The Eliza Test

Juan Zaragoza12 Aug 2025 13:28 UTC
0 points
2 comments5 min readLW link

GPT-5s Are Alive: Out­side Re­ac­tions, the Router and the Re­s­ur­rec­tion of GPT-4o

Zvi12 Aug 2025 12:40 UTC
36 points
9 comments29 min readLW link
(thezvi.wordpress.com)

Le­gal Per­son­hood—Prob­lems with the Concept

Stephen Martin12 Aug 2025 5:15 UTC
3 points
4 comments4 min readLW link

Two Types of (Hu­man) Uncertainty

Roman Malov12 Aug 2025 1:36 UTC
9 points
2 comments2 min readLW link

Thoughts on ex­trap­o­lat­ing time horizons

Nikola Jurkovic11 Aug 2025 22:36 UTC
53 points
7 comments1 min readLW link
(x.com)

CoT May Be Highly In­for­ma­tive De­spite “Un­faith­ful­ness” [METR]

GradientDissenter11 Aug 2025 21:47 UTC
64 points
3 comments24 min readLW link
(metr.org)

16 Con­crete, Am­bi­tious AI Pro­ject Pro­pos­als for Science and Security

Alejandro Acelas11 Aug 2025 20:33 UTC
13 points
0 comments1 min readLW link
(ifp.org)

How Does A Blind Model See The Earth?

henry11 Aug 2025 19:58 UTC
474 points
38 comments7 min readLW link
(outsidetext.substack.com)

How we spent our first two weeks as an in­de­pen­dent AI safety re­search group

11 Aug 2025 19:32 UTC
28 points
0 comments10 min readLW link

The Frus­tra­tions and Per­ils of Nav­i­gat­ing Blind to Rocks

jimmy11 Aug 2025 19:03 UTC
5 points
0 comments7 min readLW link

Nega­tive util­i­tar­i­anism is more in­tu­itive than you think

Nina Panickssery11 Aug 2025 16:13 UTC
13 points
25 comments3 min readLW link
(blog.ninapanickssery.com)

Dwarf Fortress and Claude’s ASCII Art Blindness

Brendan Long11 Aug 2025 16:05 UTC
16 points
1 comment3 min readLW link
(www.brendanlong.com)

Alter­na­tive Models of Superposition

11 Aug 2025 15:52 UTC
15 points
6 comments5 min readLW link

Am­bi­tion, Good and Bad: Green Grow­ing Things and Forgeworthiness

Evenstar11 Aug 2025 15:20 UTC
10 points
0 comments5 min readLW link

ARENA 5.0 Im­pact Report

11 Aug 2025 14:06 UTC
25 points
0 comments20 min readLW link

GPT-5s Are Alive: Ba­sic Facts, Bench­marks and the Model Card

Zvi11 Aug 2025 12:10 UTC
45 points
2 comments25 min readLW link
(thezvi.wordpress.com)

The tra­jec­tory of the fu­ture could soon get set in stone

wdmacaskill11 Aug 2025 11:04 UTC
41 points
2 comments3 min readLW link

Listen­ing Be­fore Speaking

Alice Blair11 Aug 2025 5:23 UTC
15 points
3 comments3 min readLW link

Le­gal Per­son­hood—Bun­dle Theory

Stephen Martin11 Aug 2025 4:32 UTC
3 points
2 comments3 min readLW link

Mea­sur­ing in­tel­li­gence and re­verse-en­g­ineer­ing goals

jessicata11 Aug 2025 2:08 UTC
33 points
10 comments9 min readLW link
(unstableontology.com)

The Ne­ces­sity of Study­ing Emer­gent Ma­chine Ethics Now

Hiroshi Yamakawa11 Aug 2025 0:37 UTC
3 points
0 comments11 min readLW link

Run-time Steer­ing Can Sur­pass Post-Train­ing: Rea­son­ing Task Performance

Tommy Xie10 Aug 2025 23:52 UTC
5 points
2 comments6 min readLW link
(www.tutke.org)

Stur­dier and Lighter Pedalboard

jefftk10 Aug 2025 23:50 UTC
9 points
0 comments2 min readLW link
(www.jefftk.com)

Un­jour­nal eval­u­a­tion of “Towards best prac­tices in AGI safety & gov­er­nance” (2023), quick take

david reinstein10 Aug 2025 22:28 UTC
7 points
2 comments1 min readLW link
(unjournal.pubpub.org)

My Least Liber­tar­ian Opinion: Ban Ex­clu­sivity Deals*

Brendan Long10 Aug 2025 21:41 UTC
78 points
17 comments2 min readLW link
(www.brendanlong.com)

Mo­ti­vated Rea­son­ing as Bias

oleg10 Aug 2025 21:15 UTC
6 points
2 comments3 min readLW link

Me­mory De­cod­ing Jour­nal Club: The den­dritic engram

Devin Ward10 Aug 2025 20:56 UTC
1 point
0 comments1 min readLW link

LLMs play pris­oner’s Dilemma

parthh0110 Aug 2025 20:36 UTC
2 points
0 comments1 min readLW link

Petrov Day: Bre­men (Oct 10)

10 Aug 2025 19:09 UTC
3 points
1 comment1 min readLW link

The Cod­ing The­o­rem — A Link be­tween Com­plex­ity and Probability

Leon Lang10 Aug 2025 15:34 UTC
32 points
4 comments9 min readLW link

AI Safety at the Fron­tier: Paper High­lights, July ’25

gasteigerjo10 Aug 2025 12:49 UTC
7 points
0 comments9 min readLW link
(aisafetyfrontier.substack.com)

From Orag­nized Shelves to Lay­ered Cat­a­logs: Ar­chi­tec­tural Ex­plo­ra­tions for Sparse Au­toen­coders—Cross­coders & Lad­der SAEs Towards Hier­ar­chi­cal Data Structure

Yuxiao10 Aug 2025 10:12 UTC
2 points
0 comments11 min readLW link

Le­gal Per­son­hood for Digi­tal Minds—Introduction

Stephen Martin10 Aug 2025 9:29 UTC
5 points
4 comments2 min readLW link

Break­ing the Cy­cle of Trauma and Tyranny: How Psy­cholog­i­cal Wounds Shape History

Dawn Drescher10 Aug 2025 8:46 UTC
42 points
6 comments12 min readLW link
(impartial-priorities.org)

Hav­ing chil­dren is not the most effec­tive way to im­prove the world. Have them be­cause you want them, not “for im­pact”.

KatWoods10 Aug 2025 6:54 UTC
12 points
2 comments2 min readLW link

A Self-Dialogue on The Value Propo­si­tion of Ro­man­tic Relationships

johnswentworth10 Aug 2025 1:28 UTC
35 points
71 comments8 min readLW link

GPT-5 writ­ing a Sin­gu­lar­ity scenario

Trevor Cappallo10 Aug 2025 0:56 UTC
25 points
7 comments34 min readLW link

[Question] Link­able images in the ed­i­tor?

Brendan Long10 Aug 2025 0:34 UTC
9 points
4 comments1 min readLW link