All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 111213 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Thoughts on extrapolating time horizons

Nikola Jurkovic11 Aug 2025 22:36 UTC

56 points

7 comments1 min readLW link

(x.com)

CoT May Be Highly Informative Despite “Unfaithfulness” [METR]

GradientDissenter11 Aug 2025 21:47 UTC

64 points

3 comments24 min readLW link

(metr.org)

16 Concrete, Ambitious AI Project Proposals for Science and Security

Alejandro Acelas11 Aug 2025 20:33 UTC

13 points

0 comments1 min readLW link

(ifp.org)

How Does A Blind Model See The Earth?

henry11 Aug 2025 19:58 UTC

494 points

41 comments7 min readLW link

(outsidetext.substack.com)

How we spent our first two weeks as an independent AI safety research group

RohanS, Rauno Arike and Shubhorup Biswas

11 Aug 2025 19:32 UTC

32 points

0 comments10 min readLW link

The Frustrations and Perils of Navigating Blind to Rocks

jimmy11 Aug 2025 19:03 UTC

5 points

0 comments7 min readLW link

Negative utilitarianism is more intuitive than you think

Nina Panickssery11 Aug 2025 16:13 UTC

13 points

24 comments3 min readLW link

(blog.ninapanickssery.com)

Dwarf Fortress and Claude’s ASCII Art Blindness

Brendan Long11 Aug 2025 16:05 UTC

16 points

1 comment3 min readLW link

(www.brendanlong.com)

Alternative Models of Superposition

zroe1 and RGRGRG

11 Aug 2025 15:52 UTC

20 points

6 comments5 min readLW link

Ambition, Good and Bad: Green Growing Things and Forgeworthiness

Evenstar11 Aug 2025 15:20 UTC

10 points

0 comments5 min readLW link

ARENA 5.0 Impact Report

JScriven, JamesH and James Fox

11 Aug 2025 14:06 UTC

25 points

0 comments20 min readLW link

GPT-5s Are Alive: Basic Facts, Benchmarks and the Model Card

Zvi11 Aug 2025 12:10 UTC

45 points

2 comments25 min readLW link

(thezvi.wordpress.com)

The trajectory of the future could soon get set in stone

wdmacaskill11 Aug 2025 11:04 UTC

41 points

2 comments3 min readLW link

Listening Before Speaking

Alice Blair11 Aug 2025 5:23 UTC

15 points

3 comments3 min readLW link

Legal Personhood—Bundle Theory

Stephen Martin11 Aug 2025 4:32 UTC

3 points

2 comments3 min readLW link

Measuring intelligence and reverse-engineering goals

jessicata11 Aug 2025 2:08 UTC

34 points

10 comments9 min readLW link

(unstableontology.com)

The Necessity of Studying Emergent Machine Ethics Now

Hiroshi Yamakawa11 Aug 2025 0:37 UTC

3 points

0 comments11 min readLW link

Run-time Steering Can Surpass Post-Training: Reasoning Task Performance

Tommy Xie10 Aug 2025 23:52 UTC

5 points

2 comments6 min readLW link

(www.tutke.org)

Sturdier and Lighter Pedalboard

jefftk10 Aug 2025 23:50 UTC

9 points

0 comments2 min readLW link

(www.jefftk.com)

Unjournal evaluation of “Towards best practices in AGI safety & governance” (2023), quick take

david reinstein10 Aug 2025 22:28 UTC

7 points

2 comments1 min readLW link

(unjournal.pubpub.org)

My Least Libertarian Opinion: Ban Exclusivity Deals*

Brendan Long10 Aug 2025 21:41 UTC

80 points

17 comments2 min readLW link

(www.brendanlong.com)

Motivated Reasoning as Bias

oleg10 Aug 2025 21:15 UTC

6 points

2 comments3 min readLW link

Memory Decoding Journal Club: The dendritic engram

Devin Ward10 Aug 2025 20:56 UTC

1 point

0 comments1 min readLW link

LLMs play prisoner’s Dilemma

parthh0110 Aug 2025 20:36 UTC

3 points

0 comments1 min readLW link

Petrov Day: Bremen (Oct 10)

marta_k and benjaminalt

10 Aug 2025 19:09 UTC

3 points

2 comments1 min readLW link

The Coding Theorem — A Link between Complexity and Probability

Leon Lang10 Aug 2025 15:34 UTC

34 points

4 comments9 min readLW link

AI Safety at the Frontier: Paper Highlights, July ’25

gasteigerjo10 Aug 2025 12:49 UTC

7 points

0 comments9 min readLW link

(aisafetyfrontier.substack.com)

From Oragnized Shelves to Layered Catalogs: Architectural Explorations for Sparse Autoencoders—Crosscoders & Ladder SAEs Towards Hierarchical Data Structure

Yuxiao10 Aug 2025 10:12 UTC

3 points

1 comment11 min readLW link

Legal Personhood for Digital Minds—Introduction

Stephen Martin10 Aug 2025 9:29 UTC

7 points

4 comments2 min readLW link

Breaking the Cycle of Trauma and Tyranny: How Psychological Wounds Shape History

Dawn Drescher10 Aug 2025 8:46 UTC

46 points

6 comments12 min readLW link

(impartial-priorities.org)

Having children is not the most effective way to improve the world. Have them because you want them, not “for impact”.

KatWoods10 Aug 2025 6:54 UTC

12 points

2 comments2 min readLW link

A Self-Dialogue on The Value Proposition of Romantic Relationships

johnswentworth10 Aug 2025 1:28 UTC

29 points

72 comments8 min readLW link

GPT-5 writing a Singularity scenario

Trevor Cappallo10 Aug 2025 0:56 UTC

25 points

7 comments34 min readLW link

[Question] Linkable images in the editor?

Brendan Long10 Aug 2025 0:34 UTC

9 points

4 comments1 min readLW link

Four places where you can put LLM monitoring

Fabien Roger and Buck

9 Aug 2025 23:10 UTC

49 points

0 comments7 min readLW link

Output and CoE Monitoring of Customer Service Representatives Shows Default Alignment

Brendan Long9 Aug 2025 21:31 UTC

21 points

0 comments1 min readLW link

Live by the Claude, Die by the Claude

Brendan McCord9 Aug 2025 20:23 UTC

2 points

3 comments7 min readLW link

(blog.cosmos-institute.org)

GPT-5 vs AI Alignment

Donatas Lučiūnas9 Aug 2025 20:05 UTC

−8 points

2 comments1 min readLW link

Saidi, My Friend—what do we owe to each other?

James Stephen Brown9 Aug 2025 19:41 UTC

10 points

0 comments5 min readLW link

Самовопрошание

Vadim Golub9 Aug 2025 19:18 UTC

−7 points

0 comments1 min readLW link

Testing the Authoritarian Bias of LLMs

Zhijing Jin, Irene Strauss, David Guzman Piedrahita and Keenan Samway

9 Aug 2025 18:09 UTC

10 points

1 comment6 min readLW link

Working with AI: Measuring the Occupational Implications of Generative AI

Annapurna9 Aug 2025 16:20 UTC

5 points

0 comments1 min readLW link

(jorgevelez.substack.com)

If worker coops are so productive, why aren’t they everywhere?

B Jacobs9 Aug 2025 14:47 UTC

36 points

19 comments4 min readLW link

(bobjacobs.substack.com)

Steganography via internal activations is already possible in small language models — a potential first step toward persistent hidden reasoning.

Ilia Shirokov and Ilya Nachevsky

9 Aug 2025 11:44 UTC

7 points

7 comments12 min readLW link

Against functionalism: a self dialogue

Algon9 Aug 2025 11:19 UTC

13 points

9 comments1 min readLW link

With the Future of the World in Your Hands, Think for 6.77 Years!

Dawn Drescher9 Aug 2025 10:44 UTC

1 point

0 comments10 min readLW link

(impartial-priorities.org)

Poll on De/Accelerating AI

denkenberger9 Aug 2025 7:13 UTC

13 points

38 comments1 min readLW link

[Event] Building What the Future Needs: A curated conference in Berlin (Sep 6, 2025) for high-impact builders and researchers

Vasilii Kondyrev8 Aug 2025 23:08 UTC

7 points

0 comments2 min readLW link

Memory Decoding Journal Club: The dendritic engram

Devin Ward8 Aug 2025 22:08 UTC

1 point

0 comments1 min readLW link

Making Sense of Consciousness Part 4: States of Consciousness

sarahconstantin8 Aug 2025 21:21 UTC

8 points

0 comments5 min readLW link

(sarahconstantin.substack.com)