All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 101112 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Run-time Steering Can Surpass Post-Training: Reasoning Task Performance

Tommy Xie10 Aug 2025 23:52 UTC

5 points

2 comments6 min readLW link

(www.tutke.org)

Sturdier and Lighter Pedalboard

jefftk10 Aug 2025 23:50 UTC

9 points

0 comments2 min readLW link

(www.jefftk.com)

Unjournal evaluation of “Towards best practices in AGI safety & governance” (2023), quick take

david reinstein10 Aug 2025 22:28 UTC

7 points

2 comments1 min readLW link

(unjournal.pubpub.org)

My Least Libertarian Opinion: Ban Exclusivity Deals*

Brendan Long10 Aug 2025 21:41 UTC

80 points

17 comments2 min readLW link

(www.brendanlong.com)

Motivated Reasoning as Bias

oleg10 Aug 2025 21:15 UTC

6 points

2 comments3 min readLW link

Memory Decoding Journal Club: The dendritic engram

Devin Ward10 Aug 2025 20:56 UTC

1 point

0 comments1 min readLW link

LLMs play prisoner’s Dilemma

parthh0110 Aug 2025 20:36 UTC

3 points

0 comments1 min readLW link

Petrov Day: Bremen (Oct 10)

marta_k and benjaminalt

10 Aug 2025 19:09 UTC

3 points

2 comments1 min readLW link

The Coding Theorem — A Link between Complexity and Probability

Leon Lang10 Aug 2025 15:34 UTC

34 points

4 comments9 min readLW link

AI Safety at the Frontier: Paper Highlights, July ’25

gasteigerjo10 Aug 2025 12:49 UTC

7 points

0 comments9 min readLW link

(aisafetyfrontier.substack.com)

From Oragnized Shelves to Layered Catalogs: Architectural Explorations for Sparse Autoencoders—Crosscoders & Ladder SAEs Towards Hierarchical Data Structure

Yuxiao10 Aug 2025 10:12 UTC

3 points

1 comment11 min readLW link

Legal Personhood for Digital Minds—Introduction

Stephen Martin10 Aug 2025 9:29 UTC

7 points

4 comments2 min readLW link

Breaking the Cycle of Trauma and Tyranny: How Psychological Wounds Shape History

Dawn Drescher10 Aug 2025 8:46 UTC

46 points

6 comments12 min readLW link

(impartial-priorities.org)

Having children is not the most effective way to improve the world. Have them because you want them, not “for impact”.

KatWoods10 Aug 2025 6:54 UTC

12 points

2 comments2 min readLW link

A Self-Dialogue on The Value Proposition of Romantic Relationships

johnswentworth10 Aug 2025 1:28 UTC

29 points

72 comments8 min readLW link

GPT-5 writing a Singularity scenario

Trevor Cappallo10 Aug 2025 0:56 UTC

25 points

7 comments34 min readLW link

[Question] Linkable images in the editor?

Brendan Long10 Aug 2025 0:34 UTC

9 points

4 comments1 min readLW link

Four places where you can put LLM monitoring

Fabien Roger and Buck

9 Aug 2025 23:10 UTC

49 points

0 comments7 min readLW link

Output and CoE Monitoring of Customer Service Representatives Shows Default Alignment

Brendan Long9 Aug 2025 21:31 UTC

21 points

0 comments1 min readLW link

Live by the Claude, Die by the Claude

Brendan McCord9 Aug 2025 20:23 UTC

2 points

3 comments7 min readLW link

(blog.cosmos-institute.org)

GPT-5 vs AI Alignment

Donatas Lučiūnas9 Aug 2025 20:05 UTC

−8 points

2 comments1 min readLW link

Saidi, My Friend—what do we owe to each other?

James Stephen Brown9 Aug 2025 19:41 UTC

10 points

0 comments5 min readLW link

Самовопрошание

Vadim Golub9 Aug 2025 19:18 UTC

−7 points

0 comments1 min readLW link

Testing the Authoritarian Bias of LLMs

Zhijing Jin, Irene Strauss, David Guzman Piedrahita and Keenan Samway

9 Aug 2025 18:09 UTC

10 points

1 comment6 min readLW link

Working with AI: Measuring the Occupational Implications of Generative AI

Annapurna9 Aug 2025 16:20 UTC

5 points

0 comments1 min readLW link

(jorgevelez.substack.com)

If worker coops are so productive, why aren’t they everywhere?

B Jacobs9 Aug 2025 14:47 UTC

36 points

19 comments4 min readLW link

(bobjacobs.substack.com)

Steganography via internal activations is already possible in small language models — a potential first step toward persistent hidden reasoning.

Ilia Shirokov and Ilya Nachevsky

9 Aug 2025 11:44 UTC

7 points

7 comments12 min readLW link

Against functionalism: a self dialogue

Algon9 Aug 2025 11:19 UTC

13 points

9 comments1 min readLW link

With the Future of the World in Your Hands, Think for 6.77 Years!

Dawn Drescher9 Aug 2025 10:44 UTC

1 point

0 comments10 min readLW link

(impartial-priorities.org)

Poll on De/Accelerating AI

denkenberger9 Aug 2025 7:13 UTC

13 points

38 comments1 min readLW link

[Event] Building What the Future Needs: A curated conference in Berlin (Sep 6, 2025) for high-impact builders and researchers

Vasilii Kondyrev8 Aug 2025 23:08 UTC

7 points

0 comments2 min readLW link

Memory Decoding Journal Club: The dendritic engram

Devin Ward8 Aug 2025 22:08 UTC

1 point

0 comments1 min readLW link

Making Sense of Consciousness Part 4: States of Consciousness

sarahconstantin8 Aug 2025 21:21 UTC

8 points

0 comments5 min readLW link

(sarahconstantin.substack.com)

What would a human pretending to be an AI say?

Brendan Long8 Aug 2025 18:56 UTC

54 points

19 comments1 min readLW link

(www.brendanlong.com)

Will morally motivated actors steer us towards a near-best future?

wdmacaskill8 Aug 2025 18:32 UTC

22 points

0 comments4 min readLW link

How hard to achieve is eutopia?

wdmacaskill8 Aug 2025 16:16 UTC

22 points

0 comments7 min readLW link

OpenAI’s GPT-OSS Is Already Old News

Zvi8 Aug 2025 12:20 UTC

40 points

4 comments18 min readLW link

(thezvi.wordpress.com)

Extract-and-Evaluate Monitoring Can Significantly Enhance CoT Monitor Performance (Research Note)

Rauno Arike, RohanS and Shubhorup Biswas

8 Aug 2025 10:41 UTC

51 points

7 comments10 min readLW link

The Tortoise and the Language Model (A Fable After Hofstadter)

mwatkins8 Aug 2025 10:39 UTC

55 points

4 comments3 min readLW link

Closed Mouth, Open Oppurtunities

CstineSublime8 Aug 2025 10:32 UTC

6 points

0 comments4 min readLW link

How anticipatory cover-ups go wrong

Kaj_Sotala8 Aug 2025 10:26 UTC

299 points

25 comments6 min readLW link

Strategic Moderation Goals (a Plan B to AI alignment)

Jim Buhler8 Aug 2025 8:08 UTC

2 points

0 comments3 min readLW link

Preface to “Simulacra and Simulators”

Fiora Starlight8 Aug 2025 7:38 UTC

14 points

0 comments7 min readLW link

METR’s Evaluation of GPT-5

GradientDissenter7 Aug 2025 22:17 UTC

145 points

15 comments20 min readLW link

(metr.github.io)

ChatGPT is the Daguerreotype of AI

Alex_Altair7 Aug 2025 22:14 UTC

42 points

2 comments7 min readLW link

Principles of AI Uncontrollability

WillPetillo7 Aug 2025 21:10 UTC

6 points

0 comments7 min readLW link

Third-order cognition as a model of superintelligence (ironically: Meta® metacognition)

soycarts7 Aug 2025 20:56 UTC

0 points

5 comments14 min readLW link

Yes, Rationalism is a Cult

James Camacho7 Aug 2025 20:43 UTC

−9 points

23 comments4 min readLW link

GPT-5 is out

david reinstein7 Aug 2025 20:33 UTC

4 points

0 comments1 min readLW link

(openai.com)

OpenAI Releases GPT-5

anaguma7 Aug 2025 18:41 UTC

18 points

0 comments1 min readLW link

(openai.com)