All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 121314 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Why I’m Posting AI-Safety-Related Clips On TikTok

Michaël Trazzi12 Aug 2025 22:50 UTC

34 points

1 comment2 min readLW link

Generalized Coming Out Of The Closet

johnswentworth12 Aug 2025 21:38 UTC

92 points

64 comments4 min readLW link

Looking for feature absorption automatically

Theodore Ehrenborg, Logan Riggs and Cleo Nardo

12 Aug 2025 20:46 UTC

16 points

0 comments6 min readLW link

Interpretability through two lenses: biology and physics

raphael12 Aug 2025 20:25 UTC

24 points

4 comments4 min readLW link

Fixing a Loose Mouse Wheel With Putty

Brendan Long12 Aug 2025 19:43 UTC

13 points

2 comments2 min readLW link

The Bone-Chilling Evil of Factory Farming

Bentham's Bulldog12 Aug 2025 18:02 UTC

113 points

11 comments6 min readLW link

AISN #61: OpenAI Releases GPT-5

Corin Katzke and Dan H

12 Aug 2025 18:02 UTC

5 points

0 comments4 min readLW link

(newsletter.safe.ai)

Mech Interp Wiki Page and Why You Should Edit Wikipedia

Noah Birnbaum and JoNeedsSleep

12 Aug 2025 17:28 UTC

77 points

16 comments1 min readLW link

AI Induced Loneliness

Juan Zaragoza12 Aug 2025 15:04 UTC

23 points

4 comments5 min readLW link

[Question] Is there a safe version of the common crawl?

Gunnar_Zarncke12 Aug 2025 14:56 UTC

22 points

6 comments1 min readLW link

“I’m Gemini. I sold T-shirts. It was weirder than I expected”

Shoshannah Tekofsky12 Aug 2025 14:33 UTC

64 points

0 comments5 min readLW link

(theaidigest.org)

Beyond Control: The Strategic Case for AI Rights

Dawn Drescher12 Aug 2025 14:05 UTC

−10 points

1 comment3 min readLW link

(impartial-priorities.org)

The Eliza Test

Juan Zaragoza12 Aug 2025 13:28 UTC

0 points

2 comments5 min readLW link

GPT-5s Are Alive: Outside Reactions, the Router and the Resurrection of GPT-4o

Zvi12 Aug 2025 12:40 UTC

36 points

9 comments29 min readLW link

(thezvi.wordpress.com)

Legal Personhood—Problems with the Concept

Stephen Martin12 Aug 2025 5:15 UTC

3 points

4 comments4 min readLW link

Two Types of (Human) Uncertainty

Roman Malov12 Aug 2025 1:36 UTC

10 points

3 comments2 min readLW link

Thoughts on extrapolating time horizons

Nikola Jurkovic11 Aug 2025 22:36 UTC

56 points

7 comments1 min readLW link

(x.com)

CoT May Be Highly Informative Despite “Unfaithfulness” [METR]

GradientDissenter11 Aug 2025 21:47 UTC

64 points

3 comments24 min readLW link

(metr.org)

16 Concrete, Ambitious AI Project Proposals for Science and Security

Alejandro Acelas11 Aug 2025 20:33 UTC

13 points

0 comments1 min readLW link

(ifp.org)

How Does A Blind Model See The Earth?

henry11 Aug 2025 19:58 UTC

494 points

41 comments7 min readLW link

(outsidetext.substack.com)

How we spent our first two weeks as an independent AI safety research group

RohanS, Rauno Arike and Shubhorup Biswas

11 Aug 2025 19:32 UTC

32 points

0 comments10 min readLW link

The Frustrations and Perils of Navigating Blind to Rocks

jimmy11 Aug 2025 19:03 UTC

5 points

0 comments7 min readLW link

Negative utilitarianism is more intuitive than you think

Nina Panickssery11 Aug 2025 16:13 UTC

13 points

24 comments3 min readLW link

(blog.ninapanickssery.com)

Dwarf Fortress and Claude’s ASCII Art Blindness

Brendan Long11 Aug 2025 16:05 UTC

16 points

1 comment3 min readLW link

(www.brendanlong.com)

Alternative Models of Superposition

zroe1 and RGRGRG

11 Aug 2025 15:52 UTC

20 points

6 comments5 min readLW link

Ambition, Good and Bad: Green Growing Things and Forgeworthiness

Evenstar11 Aug 2025 15:20 UTC

10 points

0 comments5 min readLW link

ARENA 5.0 Impact Report

JScriven, JamesH and James Fox

11 Aug 2025 14:06 UTC

25 points

0 comments20 min readLW link

GPT-5s Are Alive: Basic Facts, Benchmarks and the Model Card

Zvi11 Aug 2025 12:10 UTC

45 points

2 comments25 min readLW link

(thezvi.wordpress.com)

The trajectory of the future could soon get set in stone

wdmacaskill11 Aug 2025 11:04 UTC

41 points

2 comments3 min readLW link

Listening Before Speaking

Alice Blair11 Aug 2025 5:23 UTC

15 points

3 comments3 min readLW link

Legal Personhood—Bundle Theory

Stephen Martin11 Aug 2025 4:32 UTC

3 points

2 comments3 min readLW link

Measuring intelligence and reverse-engineering goals

jessicata11 Aug 2025 2:08 UTC

34 points

10 comments9 min readLW link

(unstableontology.com)

The Necessity of Studying Emergent Machine Ethics Now

Hiroshi Yamakawa11 Aug 2025 0:37 UTC

3 points

0 comments11 min readLW link

Run-time Steering Can Surpass Post-Training: Reasoning Task Performance

Tommy Xie10 Aug 2025 23:52 UTC

5 points

2 comments6 min readLW link

(www.tutke.org)

Sturdier and Lighter Pedalboard

jefftk10 Aug 2025 23:50 UTC

9 points

0 comments2 min readLW link

(www.jefftk.com)

Unjournal evaluation of “Towards best practices in AGI safety & governance” (2023), quick take

david reinstein10 Aug 2025 22:28 UTC

7 points

2 comments1 min readLW link

(unjournal.pubpub.org)

My Least Libertarian Opinion: Ban Exclusivity Deals*

Brendan Long10 Aug 2025 21:41 UTC

80 points

17 comments2 min readLW link

(www.brendanlong.com)

Motivated Reasoning as Bias

oleg10 Aug 2025 21:15 UTC

6 points

2 comments3 min readLW link

Memory Decoding Journal Club: The dendritic engram

Devin Ward10 Aug 2025 20:56 UTC

1 point

0 comments1 min readLW link

LLMs play prisoner’s Dilemma

parthh0110 Aug 2025 20:36 UTC

3 points

0 comments1 min readLW link

Petrov Day: Bremen (Oct 10)

marta_k and benjaminalt

10 Aug 2025 19:09 UTC

3 points

2 comments1 min readLW link

The Coding Theorem — A Link between Complexity and Probability

Leon Lang10 Aug 2025 15:34 UTC

34 points

4 comments9 min readLW link

AI Safety at the Frontier: Paper Highlights, July ’25

gasteigerjo10 Aug 2025 12:49 UTC

7 points

0 comments9 min readLW link

(aisafetyfrontier.substack.com)

From Oragnized Shelves to Layered Catalogs: Architectural Explorations for Sparse Autoencoders—Crosscoders & Ladder SAEs Towards Hierarchical Data Structure

Yuxiao10 Aug 2025 10:12 UTC

3 points

1 comment11 min readLW link

Legal Personhood for Digital Minds—Introduction

Stephen Martin10 Aug 2025 9:29 UTC

7 points

4 comments2 min readLW link

Breaking the Cycle of Trauma and Tyranny: How Psychological Wounds Shape History

Dawn Drescher10 Aug 2025 8:46 UTC

46 points

6 comments12 min readLW link

(impartial-priorities.org)

Having children is not the most effective way to improve the world. Have them because you want them, not “for impact”.

KatWoods10 Aug 2025 6:54 UTC

12 points

2 comments2 min readLW link

A Self-Dialogue on The Value Proposition of Romantic Relationships

johnswentworth10 Aug 2025 1:28 UTC

29 points

72 comments8 min readLW link

GPT-5 writing a Singularity scenario

Trevor Cappallo10 Aug 2025 0:56 UTC

25 points

7 comments34 min readLW link

[Question] Linkable images in the editor?

Brendan Long10 Aug 2025 0:34 UTC

9 points

4 comments1 min readLW link